Version Explain:

In this version these will be performed:

  1. Top features selection based on trained models’ feature importance.

    This will depend on different number of CpGs selected and different features selection methods.

    The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.

  2. Top features selection based on trained models’ feature importance with different selection methods.

    There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.

    • The frequency / common feature importance is processed in the following:
      1. select the TOP Number of features (say 40) for each model
      2. calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
      3. For each features that appear greater than half time, we consider it’s important and collect these important features as common features.
  3. Output two data frames that will be used in Pareto optimal.

    One is filtered data frame with Top Number of features based on different method selection.

    The another one is the phenotype data frame.

  4. The section of evaluation for the output selected feature performance based on three methods are performed.

Input Session

This part is collection of input , change them as needed.

File Path :

csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"

TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"

Number of Top CpGs keeped:

# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs

Session Input:

Session 1.6.1 Missing Value

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG_NUM = 1

Session 1.6.2 Feature Selection

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"

METHOD_FEATURE_FLAG_NUM = 4

Session 7.0 Important Features

# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization

NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20

Session 8.0 Feature Selection and Output

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenoOutPUt_FLAG = TRUE
  

  
# For 8.0 Feature Selection and Output : 
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
#   This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
#   This is the method performed for the Output stage feature selection method


INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE


if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
  OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method4_CN_vs_AD\\Method4_CN_vs_AD_SelectedFeatures\\"
  OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
  
  if (dir.exists(OUTUT_CSV_PATHNAME)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
    message("Directory created.")
    }
  
}
## Directory already exists.

Session 10.0 Perfomance Metrics

FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.

# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Metrics_Table_Output_FLAG = TRUE


FLAG_WRITE_METRICS_DF = TRUE



if(FLAG_WRITE_METRICS_DF){
  OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method4_CN_vs_AD\\Method4_CN_vs_AD_PerformanceMetrics\\"
  
  OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
  
  if (dir.exists(OUTUT_PerfMertics_directory)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
    message("Directory created.")
    }
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
  
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method4_CN_vs_AD\\Method4_CN_vs_AD_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

1. Preprocess

Packages and Libraries that may need to install and use.

# Function to check and install Bioconductor package: "limma"

install_bioc_packages <- function(packages) {
  if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
  }
  for (pkg in packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
      BiocManager::install(pkg, dependencies = TRUE)
    } else {
      message(paste("Package", pkg, "is already installed."))
    }
  }
}


install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)

Set seed for reproduction.

set.seed(123)

1.1 Data Read and Preview

csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs

1.1.1 csv_NI1905 (“ADNI_covariate_withEpiage_1905obs.csv”)

head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905   23

1.1.2 TopSelectedCpGs

dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"

1.1.3 “TopN_CpGs”

1.1.3.1 Select Top N CpGs

This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.

sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs

Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.

1.1.3.2 Preview “TopN_CpGs”

dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"

1.2 Check Duplicates

Now, let’s check with duplicate of Sample ID (“barcodes”):

Start with people who don’t have the unique ID (“uniqueID = 0”):

library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256   23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649  23
duplicates <-  csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()

print(dim(duplicates))
## [1]  0 23
rm(duplicates)

Based on the output of dimension , they have the different Sample ID (“barcodes”).

Then check with all records, whether they have duplicated Sample ID (“barcodes”).

duplicates <-  csv_NI1905 %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()
print(dim(duplicates))
## [1]  0 23

From the above output, we can see the Sample ID (“barcodes”) are unique.

names(csv_NI1905)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"   "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"    "ABETA"       "TAU"        
## [15] "PTAU"        "PC1"         "PC2"         "PC3"         "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"

There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”

csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649  23

1.3 Remove NA values

Since “DX” will be response variable, we first remove all rows with NA value in “DX” column

# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX)) 

1.4 Sample Name filtering

We only keep with the samples which appears in both datasets.

Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000  648
dim(csv_NI1905)
## [1] 648  23

1.5 Merged DataFrame

1.5.1 Merge two datasets

Merge these two datasets and tored into “merged_df”

trans_TopN_CpGs<-t(TopN_CpGs)

# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.

trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"   "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"    "ABETA"       "TAU"        
## [15] "PTAU"        "PC1"         "PC2"         "PC3"         "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648  23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw

1.5.2 “merged_df”

head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.5.3 Feature Names

(1) CpGs (beta values)

The name of feature CpGs could be called by: “featureName_CpGs”

featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.6 Clean Merged datasets

clean_merged_df<-merged_df

1.6.1 Missing Value

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA   TAU  PTAU 
##   109   109   109

Choose Output Data

Choose the method we want the data apply. The output dataset name is “clean_merged_df”.

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG = Impute_NA_FLAG_NUM

(1) Impute with Mean

if (Impute_NA_FLAG == 1){
  clean_merged_df_imputed_mean<-clean_merged_df

  mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
  clean_merged_df_imputed_mean$ABETA[
    is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA

  mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$TAU[
    is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA

  mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$PTAU[
    is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
  
  clean_merged_df = clean_merged_df_imputed_mean 
}

(2) Impute with KNN

library(VIM)
if (Impute_NA_FLAG == 2){
  df_imputed_KNN <- kNN(merged_df, k = 5)
  imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
  print(imputed_summary[imputed_summary > 0])
  clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}

Check the missing value problem solved

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)

1.6.2 Feature Selection

Choose Method Use

Choose the method we want to use

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"

METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM

(1) Method One

if (METHOD_FEATURE_FLAG ==  1){
  df_fs_method1 <- clean_merged_df
}
Picking Features
if(METHOD_FEATURE_FLAG ==  1){
  
  phenotic_features_m1<-c("DX","age.now","PTGENDER",
                          "PC1","PC2","PC3")
  pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
  df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
  df_fs_method1$DX<-as.factor(df_fs_method1$DX)
  df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
  head(df_fs_method1[,1:5],n=3)
  dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG ==  1){
  dim(df_fs_method1)
}
Perform DMP - Use LIMMA

Create contrast matrix for comparing CN vs Dementia vs MCI

if(METHOD_FEATURE_FLAG == 1){

  pheno_data_m1 <- df_fs_method1[,phenotic_features_m1] 
  head(pheno_data_m1[,1:5],n=3)
  
  pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
  design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
                         data = pheno_data_m1)

  colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
  colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
  colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"

  head(design_m1)
  
  cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
  fit_m1 <- lmFit(cpg_matrix_m1, design_m1)


}
if(METHOD_FEATURE_FLAG == 1){
  # for here, we have three labels. The contrasts to compare groups will be: 
  contrast_matrix_m1 <- makeContrasts(
  MCI_vs_CN = MCI - CN,
  Dementia_vs_CN = Dementia - CN,
  Dementia_vs_MCI = Dementia - MCI,
  levels = design_m1
  )
  fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
  fit2_m1 <- eBayes(fit2_m1)
  
  topTable(fit2_m1, coef = "MCI_vs_CN") 
  topTable(fit2_m1, coef = "Dementia_vs_CN")  
  topTable(fit2_m1, coef = "Dementia_vs_MCI") 
  summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
  table(summary_results_m1)

  
}
if(METHOD_FEATURE_FLAG == 1){

  significant_dmp_filter_m1 <- summary_results_m1 != 0 
  significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
    apply(significant_dmp_filter_m1, 1, any)])
  print(paste("The significant CpGs after DMP are:",
             paste(significant_cpgs_m1_DMP, collapse = ", ")))
  print(paste("Length of CpGs after DMP:", 
              length(significant_cpgs_m1_DMP)))
  
  pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
  df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]

  dim(df_fs_method1)
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 1){
  
  library(recipes)
  df_picked <- df_fs_method1
 
  rec <- recipe(DX ~ ., data = df_picked) %>%
    step_zv(all_predictors()) %>%  
   # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked)

  processed_data_m1 <- bake(rec_prep, new_data = df_picked)
  dim(processed_data_m1)
  processed_data_m1_df<-as.data.frame(processed_data_m1)
  rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
  AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
  head(AfterProcess_FeatureName_m1)
  tail(AfterProcess_FeatureName_m1)
}
if(METHOD_FEATURE_FLAG == 1){
  head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
  lastColumn_NUM<-dim(processed_data_m1)[2]
  last5Column_NUM<-lastColumn_NUM-5
  head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}

(2) Method Two - PCA

if(METHOD_FEATURE_FLAG == 2){
  bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
                          "prop.CD4T","prop.CD8T","prop.Mono",
                          "prop.Neutro","prop.Eosino")
  pickedFeatureName_m2<-c("DX","age.now",
                          "PTGENDER",bloodPropFeatureName,
                          "ABETA","TAU","PTAU",featureName_CpGs)
  df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
Use “Recipe” preprocess the Data
if(METHOD_FEATURE_FLAG == 2){
  library(recipes)

  rec <- recipe(DX ~ ., data = df_fs_method2) %>%
    step_zv(all_predictors()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_fs_method2)

  processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
  dim(processed_data_m2)
}
PCA
if(METHOD_FEATURE_FLAG == 2){
  
  X_df_m2<-subset(processed_data_m2,select = -DX)
  Y_df_m2<-processed_data_m2$DX

  pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)

  summary(pca_result)

  screeplot(pca_result,type="lines")

}
if(METHOD_FEATURE_FLAG == 2){
  
  PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
  library(caret)
  preproc<-preProcess(X_df_m2,method="pca",
                      thresh = PCA_component_threshold)
  X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
  data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
  colnames(data_processed_PCA)[
    which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
  head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
  processed_data_m2<-data_processed_PCA
  AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}

(3) Method Three - Covert to Binary Class

if(METHOD_FEATURE_FLAG == 3){
  
  df_fs_method3<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 3){
  phenotic_features_m3<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
  df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]

  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
  head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  dim(df_picked_m3)
}
Change to Two Class Classification
if(METHOD_FEATURE_FLAG == 3){
  df_picked_m3<-df_picked_m3 %>% mutate(
    DX = ifelse(DX == "CN", "CN",ifelse(DX 
    %in% c("MCI","Dementia"),"CI",NA)))
  
  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)

  head(df_picked_m3[1:10],n=3)

}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 3){
  pheno_data_m3 <- df_picked_m3[,phenotic_features_m3] 
  head(pheno_data_m3[,1:5],n=3)

  design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)

  colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
  colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"

  head(design_m3)

  beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.

if(METHOD_FEATURE_FLAG == 3){

  fit_m3 <- lmFit(beta_values_m3, design_m3)
  head(fit_m3$coefficients)


  contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
 
  fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  decideTests(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  dmp_results_m3_try1 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m3_try1)

}
if(METHOD_FEATURE_FLAG == 3){
  # Identify DMPs, we will use this one:
  dmp_results_m3 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m3)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 3){

  significant_dmp_filter <- dmp_results_m3 != 0 
  significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
  df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]

  dim(df_picked_m3)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 3){
  full_results_m3 <- topTable(fit2_m3, number=Inf)
  full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
  head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  sorted_full_results_m3 <- full_results_m3[
    order(full_results_m3$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  library(ggplot2)
  ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 3){
  ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 3){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m3) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m3)

  processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
  processed_data_m3_df <- as.data.frame(processed_data_m3)
  rownames(processed_data_m3_df) <- rownames(df_picked_m3)
  dim(processed_data_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
  head(AfterProcess_FeatureName_m3)
  tail(AfterProcess_FeatureName_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  levels(df_picked_m3$DX)
}
if(METHOD_FEATURE_FLAG == 3){
  head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  lastColumn_NUM_m3<-dim(processed_data_m3)[2]
  last5Column_NUM_m3<-lastColumn_NUM_m3-5
  head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
  levels(processed_data_m3$DX)
}

(4) Method Four - CN vs AD

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 4){
  
  df_fs_method4<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 4){
  phenotic_features_m4<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
  df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]

  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
  head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
  dim(df_picked_m4)
}
## [1]  648 5006
Filter and Change to Classification with ‘CN vs AD (Dementia)’
if(METHOD_FEATURE_FLAG == 4){
  df_picked_m4<-df_picked_m4 %>%  filter(DX != "MCI") %>% droplevels()

  
  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)

  head(df_picked_m4[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 4){
  print(dim(df_picked_m4))
  print(table(df_picked_m4$DX))
}
## [1]  315 5006
## 
##       CN Dementia 
##      221       94
if(METHOD_FEATURE_FLAG == 4){
  df_fs_method4 <- df_fs_method4 %>%  filter(DX != "MCI") %>% droplevels()
  df_fs_method4$DX<-as.factor(df_fs_method4$DX)
  print(head(df_fs_method4))
  print(dim(df_fs_method4))
}
##      cg08223187 cg15794987 cg04821830 cg24629711 cg17380855 cg10360725 cg15365500 cg14323910 cg16464924 cg18993517 cg02483977 cg11430599 cg04431313 cg02006792 cg13749548 cg12395926 cg08736526
##      cg20479209 cg16767700 cg12381531 cg03398919 cg26389052 cg13657582 cg19619592 cg14112997 cg03278611 cg13573375 cg17348244 cg18218696 cg04560372 cg05276972 cg14866535 cg01625212 cg10717149
##      cg16055189 cg07336544 cg12695921 cg11008123 cg06193597 cg17002338 cg11956442 cg08146708 cg16377948 cg03670113 cg10376666 cg10832239 cg00540295 cg11957130 cg19697575 cg06792428 cg14755254
##      cg02621446 cg12375161 cg26197191 cg01742836 cg21498547 cg22723460 cg03013609 cg16020483 cg21211688 cg06377160 cg03449867 cg11848150 cg24470466 cg25484147 cg14926231 cg23383154 cg27589309
##      cg25013753 cg23854988 cg15066135 cg01105403 cg12957945 cg08896901 cg15825171 cg02095601 cg17237138 cg14179288 cg17349011 cg09214175 cg07249765 cg16228451 cg25710745 cg07173352 cg18391209
##      cg20757478 cg27024127 cg05338731 cg03673787 cg26505478 cg04312783 cg00456343 cg13779327 cg11784298 cg18816122 cg23916408 cg22777560 cg21130926 cg12146221 cg12509997 cg19969624 cg20381372
##      cg10542624 cg15546285 cg00143986 cg13910785 cg20336007 cg11013605 cg05257947 cg16412745 cg17174466 cg13127920 cg11834635 cg04637592 cg26697110 cg14549203 cg14281403 cg19311470 cg09533869
##      cg19139664 cg25174111 cg08522473 cg06293782 cg05248234 cg05234269 cg18312428 cg08417382 cg16792234 cg06825310 cg03165426 cg16112880 cg07856408 cg24051749 cg12657416 cg14293999 cg19278212
##      cg11857259 cg03900028 cg20089799 cg05554406 cg05646854 cg10632770 cg11888470 cg10238924 cg25099095 cg03640465 cg11738485 cg22444562 cg19377607 cg06136002 cg14307563 cg22337626 cg09209679
##      cg06301252 cg23513018 cg13248811 cg04861073 cg25285484 cg17018422 cg07420417 cg02856402 cg19393008 cg12515659 cg18847598 cg27586797 cg05492172 cg08914944 cg02299007 cg18624102 cg08238375
##      cg16119776 cg21209485 cg16523141 cg01858500 cg14651435 cg05900567 cg23840008 cg23905789 cg11331837 cg04843630 cg19416156 cg16514085 cg03555976 cg23696472 cg13774807 cg25322986 cg11897887
##      cg13574945 cg11187460 cg06722086 cg27154731 cg02448536 cg23489384 cg13653328 cg03551377 cg02627872 cg14114910 cg06688803 cg09451339 cg17388779 cg11400162 cg03855028 cg13298953 cg11866463
##      cg10463322 cg07844442 cg06961873 cg26864826 cg16886051 cg14564293 cg00722793 cg25134647 cg26392703 cg02863943 cg09993319 cg16019721 cg18088486 cg18266588 cg03967651 cg03098826 cg14871336
##      cg06596097 cg27030917 cg25872828 cg00399895 cg17616590 cg13462557 cg02464073 cg12852500 cg04212500 cg02120552 cg09885502 cg09141759 cg04246708 cg27160931 cg20239921 cg19594252 cg14401833
##      cg00257789 cg00439656 cg08332381 cg09935224 cg13195461 cg11231155 cg22851875 cg03407524 cg13130271 cg12935118 cg07437923 cg02491969 cg12012426 cg26495272 cg00727777 cg06728147 cg13910001
##      cg14773256 cg24534774 cg06032337 cg09817414 cg12434901 cg18075755 cg23159970 cg10788927 cg14048271 cg05125667 cg27230769 cg24586205 cg10995422 cg16037139 cg09480417 cg05392160 cg08266644
##      cg25069157 cg04540199 cg21294301 cg21587006 cg01553548 cg08397053 cg00572336 cg11128983 cg11846372 cg25985455 cg04145681 cg14597070 cg08708231 cg00258945 cg18443741 cg00999469 cg01608425
##      cg21473514 cg17421046 cg10123377 cg17040924 cg02869694 cg01891583 cg04388792 cg11673471 cg03326823 cg02628879 cg12208638 cg27639199 cg09229960 cg03329597 cg15184353 cg01426558 cg04131969
##      cg10890644 cg00567916 cg25543264 cg18285382 cg10422744 cg03796003 cg06405219 cg11796481 cg02617100 cg04680230 cg15520372 cg15946815 cg24851651 cg11431762 cg21046080 cg15567368 cg08775595
##      cg08049519 cg11366142 cg06307915 cg25203245 cg04557130 cg22071943 cg17256465 cg11418607 cg27368331 cg24643105 cg03549208 cg27079096 cg00356335 cg13514836 cg02877261 cg12342501 cg11640767
##      cg14889167 cg10117599 cg09972436 cg08717807 cg11144103 cg07203999 cg08331829 cg14117320 cg01132052 cg02898382 cg11294950 cg21795255 cg23813394 cg20455959 cg02658985 cg15033116 cg22564046
##      cg10765459 cg13989295 cg13232075 cg19949776 cg01502466 cg11251367 cg18285337 cg06584796 cg10772532 cg10005146 cg17653352 cg17480035 cg14252149 cg06701026 cg20086657 cg13145550 cg12219587
##      cg02533724 cg02080302 cg10528424 cg11265381 cg06091288 cg05809586 cg22274196 cg16390578 cg16405337 cg25465065 cg16788319 cg20976286 cg20536971 cg13655169 cg01280906 cg07837161 cg05656210
##      cg11420142 cg05448209 cg09411910 cg08103988 cg04049787 cg25614253 cg05425577 cg06849002 cg10584449 cg21463262 cg04831745 cg10713875 cg04104977 cg02702444 cg13372276 cg23803868 cg19236675
##      cg17544920 cg05305893 cg04971183 cg03834574 cg11716267 cg03690824 cg10978613 cg14838992 cg14811585 cg11787737 cg16268937 cg23954274 cg01185010 cg02658043 cg02100397 cg00814598 cg17004290
##      cg01821635 cg09070882 cg14460215 cg15967058 cg20592836 cg21223191 cg08238319 cg07640670 cg11720573 cg20015269 cg16170636 cg20474581 cg01483826 cg04814784 cg10900271 cg24818939 cg04036196
##      cg25879395 cg07128503 cg09708852 cg21881821 cg10006614 cg11010744 cg12040381 cg02794321 cg06737250 cg08096668 cg05593887 cg02183930 cg00366603 cg09872687 cg02323098 cg18339359 cg06041068
##      cg16167565 cg08969352 cg16819848 cg15074403 cg12284872 cg21193926 cg01350803 cg09672255 cg16069065 cg23549329 cg20218135 cg17404449 cg02890259 cg15068593 cg06378142 cg10844498 cg26705599
##      cg09481605 cg19843426 cg09403466 cg11826549 cg16894909 cg08283200 cg15421137 cg23767642 cg09994954 cg00045070 cg09331190 cg26621757 cg03664889 cg26820259 cg00704664 cg16405288 cg07227024
##      cg03029566 cg14129832 cg14629397 cg18662228 cg08407901 cg06101792 cg00030117 cg09949906 cg01113595 cg15014361 cg02981209 cg16140565 cg11982081 cg24080129 cg12568536 cg05724451 cg04990378
##      cg21783012 cg16995742 cg23764766 cg03122926 cg22237644 cg05023192 cg04577745 cg13038767 cg04073914 cg01016092 cg01600516 cg21692241 cg06273195 cg19154950 cg05890457 cg24506579 cg05712748
##      cg21106100 cg02866897 cg02302183 cg20414709 cg03221390 cg01594260 cg20831491 cg16142759 cg21290550 cg13972557 cg24057558 cg04728936 cg22251955 cg09668218 cg05806018 cg00017157 cg11920595
##      cg04998327 cg02507579 cg01378439 cg16702660 cg23414024 cg05321907 cg00924943 cg27187580 cg18673341 cg10985055 cg23680829 cg08059778 cg17671621 cg09281805 cg16655343 cg24844518 cg05528899
##      cg04777551 cg14834300 cg24741068 cg07878625 cg10811640 cg20221740 cg10444583 cg27467876 cg03781170 cg24426788 cg19364778 cg07501029 cg16899969 cg18467790 cg13031029 cg24186901 cg13323954
##      cg00466309 cg15909443 cg20139683 cg26212480 cg07178458 cg27362989 cg10750306 cg04605872 cg00944631 cg21242448 cg17342292 cg15602423 cg13079150 cg24136292 cg10923851 cg08210706 cg12413138
##      cg13514954 cg26777760 cg15988569 cg09084244 cg12124890 cg01081395 cg11835797 cg01667144 cg15669985 cg11786587 cg09735782 cg20673407 cg24074981 cg19248407 cg11802689 cg05704942 cg09418475
##      cg06616511 cg00723973 cg24846009 cg24738483 cg23712855 cg00845806 cg06699671 cg01104393 cg10032780 cg04798314 cg17327157 cg09985794 cg23162598 cg07615678 cg08408305 cg02956194 cg17171259
##      cg26222222 cg22627029 cg24682077 cg08657228 cg03701469 cg09289202 cg15912814 cg24790801 cg22256607 cg23766887 cg14555881 cg07882838 cg01715830 cg06002867 cg23052585 cg27341708 cg11266396
##      cg17149765 cg01118640 cg21838924 cg14799809 cg02907150 cg19405842 cg06545341 cg04665049 cg19577958 cg25888700 cg13054220 cg21566433 cg12466610 cg07861180 cg01543583 cg08014404 cg03327352
##      cg15693668 cg09829645 cg04531182 cg11988372 cg24627956 cg23996696 cg16762802 cg16864708 cg22452543 cg02832783 cg03391801 cg11602758 cg07618759 cg17419220 cg02994943 cg13765957 cg20300784
##      cg04982228 cg16955800 cg22276800 cg07909498 cg24309769 cg12650227 cg16044734 cg14609402 cg16733676 cg18455878 cg02368820 cg05452174 cg00033213 cg26365090 cg18523259 cg26567385 cg05580683
##      cg06286533 cg16490124 cg01127608 cg14029254 cg26418790 cg02320265 cg05130642 cg27433479 cg04158402 cg12179578 cg13767616 cg09730272 cg22508145 cg03386329 cg20205188 cg11884832 cg04845852
##      cg14599155 cg02735334 cg22411599 cg15491125 cg27288829 cg08250118 cg04462915 cg27153751 cg23499373 cg09476440 cg03330259 cg22336867 cg02195366 cg20223677 cg13078798 cg00256329 cg08880261
##      cg26217827 cg05088151 cg16971657 cg06065495 cg03525554 cg25649515 cg00293330 cg26077133 cg14232344 cg04907664 cg14863642 cg08779649 cg07456472 cg16214670 cg12169700 cg09623464 cg06777697
##      cg08084984 cg09866143 cg16802892 cg07465457 cg02679322 cg22169467 cg11585022 cg19935756 cg24328927 cg13885788 cg23936477 cg15828613 cg00553601 cg20017683 cg11758647 cg02173328 cg07269319
##      cg08603678 cg19539986 cg00691480 cg05696779 cg24264679 cg16590821 cg03770889 cg01004097 cg20766178 cg11666326 cg03749159 cg25561557 cg04028540 cg01413796 cg13244998 cg13203541 cg06060457
##      cg06042004 cg05200992 cg18827503 cg13564529 cg01201512 cg16932018 cg26069044 cg14877834 cg00913770 cg02819655 cg19214707 cg19504860 cg03334316 cg12920781 cg22862357 cg15450782 cg08458132
##      cg05363576 cg12262617 cg03108651 cg04531698 cg08855111 cg25450321 cg22901347 cg00156497 cg03088219 cg10444733 cg04926069 cg06683092 cg15355235 cg27501723 cg02909570 cg00546757 cg05340866
##      cg04305804 cg05238218 cg08477332 cg27488875 cg12682323 cg08439705 cg12602563 cg01215118 cg11404906 cg25721006 cg05799859 cg04238896 cg22710716 cg18615537 cg14629612 cg09148704 cg10140678
##      cg13023833 cg20337029 cg09588531 cg11637006 cg10306192 cg25664050 cg16818568 cg17779733 cg08043513 cg26096605 cg09068955 cg25674027 cg05504986 cg07225509 cg12074150 cg05331763 cg15360451
##      cg09460641 cg17365146 cg16533028 cg14749747 cg25508573 cg13072209 cg13371681 cg01359658 cg05767119 cg24836826 cg08963013 cg13038195 cg17738613 cg02575605 cg27277239 cg12340462 cg03466780
##      cg04683516 cg10058204 cg11102724 cg12195446 cg02772171 cg07826058 cg00009523 cg25601713 cg04945432 cg05895137 cg04123498 cg14615128 cg05918715 cg03699194 cg27056740 cg02154276 cg18346634
##      cg10987536 cg17251507 cg09650803 cg09134165 cg16241932 cg20502501 cg02333283 cg12565083 cg07800510 cg02216016 cg08782677 cg16357225 cg12155450 cg07456585 cg17186592 cg12240569 cg18500967
##      cg19992190 cg16915302 cg06498495 cg16956806 cg21499289 cg00114625 cg10155537 cg15925199 cg15568074 cg04425994 cg24991845 cg22026089 cg13569207 cg23623404 cg00018261 cg11294350 cg06012903
##      cg00878023 cg01835443 cg24638099 cg17265515 cg04066495 cg06746449 cg00399450 cg01824170 cg11418303 cg17292622 cg14192979 cg07258715 cg12120033 cg25342508 cg17906851 cg01933473 cg13522370
##      cg00813093 cg09413252 cg16661157 cg23892028 cg18683228 cg14797147 cg16089727 cg15083522 cg18857647 cg09060772 cg17155524 cg13239134 cg09636756 cg01872988 cg16894263 cg22481673 cg08136432
##      cg10767615 cg14295915 cg00767423 cg13883027 cg17509989 cg06099085 cg20485607 cg10511229 cg01055691 cg02940070 cg27129755 cg00046099 cg13120932 cg16771215 cg24506221 cg25624849 cg08041188
##      cg03737947 cg26679884 cg08919780 cg01926326 cg08394893 cg27114706 cg18514595 cg27308738 cg00413734 cg11653314 cg05393861 cg17151385 cg19218082 cg22305850 cg27625131 cg26846609 cg03493899
##      cg15613905 cg26038514 cg12036633 cg03544800 cg10482512 cg05400155 cg06237602 cg08516018 cg22221554 cg00254095 cg00836161 cg10701801 cg17178900 cg02761375 cg24479484 cg15744124 cg08537127
##      cg07056506 cg21070081 cg15703000 cg09072865 cg00112256 cg11074323 cg13102742 cg19860695 cg13261753 cg26089705 cg15281606 cg06279067 cg18213661 cg11558795 cg04867412 cg26590106 cg17771423
##      cg23554546 cg12214399 cg02823329 cg04453550 cg09998151 cg03075889 cg26428339 cg26882525 cg17035182 cg19300401 cg20597646 cg00295418 cg08890411 cg25656978 cg15441831 cg06844213 cg18561199
##      cg05349513 cg15517438 cg05091477 cg09539170 cg21139150 cg13160852 cg23019589 cg19793163 cg17181941 cg25321762 cg25059696 cg15418221 cg25001484 cg21927991 cg14061270 cg02154924 cg13688351
##      cg01941243 cg18557837 cg05134736 cg24460485 cg10942642 cg22557383 cg25026580 cg15395171 cg05977333 cg26709433 cg01835922 cg14195178 cg21072408 cg15977272 cg00274640 cg10590622 cg06334238
##      cg18576044 cg22138998 cg12077809 cg26856631 cg07697459 cg16767880 cg03119308 cg05059349 cg18131458 cg00035449 cg14509777 cg00055165 cg23403836 cg02882301 cg17149911 cg21513542 cg00841008
##      cg21388339 cg25205946 cg20749341 cg23733925 cg10818676 cg23919742 cg10507965 cg21358336 cg08902358 cg07145234 cg10415021 cg08479532 cg08570077 cg21714731 cg26565914 cg13978098 cg09780150
##      cg01341801 cg00293269 cg00192980 cg19774683 cg06352616 cg07954607 cg08980509 cg20207108 cg19834421 cg13815695 cg00265812 cg26642936 cg25206026 cg14351440 cg04876534 cg08002427 cg17155577
##      cg00980980 cg00161838 cg04888234 cg15195148 cg10835413 cg13081560 cg08977311 cg22405556 cg11209190 cg10523200 cg08096656 cg04371001 cg19274180 cg02814135 cg01156747 cg04596655 cg09350919
##      cg15133953 cg13074018 cg09352925 cg12259892 cg14780448 cg01296877 cg06362582 cg13067096 cg01223071 cg11450075 cg12897690 cg24926791 cg14375582 cg13603318 cg20187719 cg01379313 cg17590101
##      cg10786572 cg00810519 cg08327960 cg03842120 cg01500431 cg22109827 cg27481428 cg07210229 cg19723528 cg09022647 cg05680665 cg01188578 cg09312897 cg00680673 cg19168249 cg03639185 cg05383895
##      cg23939077 cg07505631 cg20270941 cg09157251 cg25755428 cg13950578 cg26422465 cg27248959 cg11377625 cg19750824 cg02902672 cg09364373 cg13915481 cg06634917 cg05476522 cg20741235 cg12158214
##      cg17723206 cg05935445 cg01251131 cg25997988 cg14152758 cg03600007 cg01201914 cg06389521 cg18828306 cg20543970 cg02901522 cg13530263 cg24748621 cg16596266 cg18021992 cg15388766 cg16211147
##      cg19384241 cg11438323 cg03526459 cg20213329 cg03478816 cg18717600 cg00322820 cg11965913 cg26287822 cg10991108 cg16398051 cg16641060 cg16104636 cg27086157 cg17479100 cg19518539 cg26690407
##      cg25225807 cg15535896 cg24398793 cg20859738 cg12262000 cg15932613 cg15229668 cg06139288 cg16081854 cg26354017 cg18698799 cg07446674 cg27093646 cg06055266 cg00631877 cg12501287 cg11723923
##      cg12277627 cg00348031 cg17508549 cg14228103 cg16310958 cg06500073 cg01721300 cg05847731 cg10690713 cg24065597 cg06671703 cg18821122 cg02370566 cg20704148 cg08152721 cg14056849 cg13506281
##      cg09430642 cg08514194 cg05095647 cg03655023 cg11308037 cg23022053 cg14113515 cg13017022 cg27494055 cg01462799 cg26824678 cg03050491 cg07684215 cg21634283 cg09307883 cg07498088 cg15727053
##      cg24245216 cg12482297 cg03714923 cg08210468 cg19505129 cg00814186 cg14964115 cg15609861 cg16431836 cg01479916 cg05385718 cg05841700 cg06051619 cg06487085 cg00866176 cg12914114 cg10738003
##      cg06673178 cg27592925 cg10596483 cg25977769 cg25712015 cg23412653 cg21934405 cg19321437 cg16756025 cg01978703 cg10738648 cg16579946 cg08138245 cg12026625 cg07386410 cg24996718 cg15570860
##      cg20060160 cg15132295 cg01463110 cg15844450 cg21137943 cg23836570 cg15165694 cg02262167 cg24284539 cg22471695 cg03748372 cg08842642 cg20798066 cg10326673 cg09785377 cg15266057 cg26278987
##      cg05374090 cg04487202 cg24018148 cg00474373 cg06427702 cg16536985 cg15255859 cg05308244 cg22807592 cg20370184 cg25940844 cg00835812 cg17965552 cg09456260 cg15555926 cg14544439 cg05961492
##      cg07176285 cg10240906 cg15652532 cg08849813 cg01869765 cg22120018 cg10776061 cg06616857 cg05025374 cg26908356 cg10482495 cg03335173 cg02122327 cg12784167 cg01058588 cg05792312 cg21234342
##      cg08076861 cg23947872 cg18190829 cg26628435 cg25427918 cg15633912 cg02495179 cg21203249 cg10829391 cg12386614 cg26263138 cg11524947 cg03871183 cg26011946 cg23951868 cg12614702 cg17363084
##      cg09091181 cg04875706 cg25317262 cg02494911 cg27578568 cg22972806 cg05967787 cg04402345 cg02775175 cg14241748 cg03706056 cg26495595 cg19471911 cg00956039 cg20823859 cg14159672 cg21360798
##      cg15950547 cg15907464 cg19680693 cg07158505 cg06813297 cg02078724 cg01407424 cg15618087 cg21854924 cg01957222 cg24730756 cg07533224 cg03635532 cg04242342 cg05373298 cg16908938 cg14127016
##      cg09056691 cg22045528 cg07133434 cg27501007 cg07674503 cg06002687 cg06316758 cg07590402 cg02933448 cg22618269 cg18932686 cg11184697 cg17970282 cg16956665 cg17036062 cg08405463 cg21543103
##      cg07262858 cg04712194 cg14970569 cg05707218 cg13905298 cg11894108 cg03192273 cg15034216 cg05427163 cg20981163 cg24009806 cg17196155 cg05782975 cg00905457 cg06297686 cg06779802 cg04520693
##      cg03661789 cg07648454 cg12551908 cg22671798 cg10860619 cg00345083 cg00337921 cg17399684 cg09247979 cg27017735 cg04610028 cg01472026 cg03643559 cg07750402 cg07240846 cg06915915 cg12261681
##      cg07796782 cg16637584 cg05730108 cg10122899 cg01871867 cg15211026 cg02890235 cg19998137 cg06972843 cg06576965 cg10818284 cg03224005 cg04481077 cg10923350 cg05034175 cg09342610 cg12898220
##      cg06052372 cg02246922 cg20566384 cg13156574 cg08798116 cg03155755 cg04422742 cg18580559 cg07126775 cg12284142 cg16471877 cg18618432 cg00730761 cg22077592 cg06609793 cg16221895 cg07393670
##      cg08673419 cg00259849 cg08848711 cg27083089 cg26159385 cg04481635 cg20549346 cg25436480 cg06483046 cg00729461 cg14780957 cg07632860 cg23443158 cg16194687 cg08797383 cg14720319 cg08916385
##      cg11732753 cg02074316 cg04771146 cg22346540 cg15499467 cg05208607 cg24608181 cg26674826 cg04217946 cg04727458 cg19198567 cg13815872 cg24086348 cg00968488 cg24904436 cg12534577 cg04317640
##      cg26128147 cg02550738 cg17396400 cg00011891 cg05407200 cg01008088 cg15865722 cg11583848 cg07946630 cg15835795 cg03966315 cg06055478 cg07791065 cg23603995 cg24232980 cg25374269 cg09146364
##      cg00200463 cg02615131 cg20078646 cg11268585 cg13913990 cg06417478 cg13799572 cg20964965 cg24090628 cg26023405 cg05111645 cg00832270 cg17939448 cg12785025 cg23762217 cg06621919 cg18310072
##      cg26584339 cg18037388 cg01876809 cg15654812 cg06955954 cg14908122 cg16576930 cg03172493 cg17382566 cg03944921 cg13682241 cg06864789 cg17853057 cg18958984 cg14655569 cg17616663 cg04861534
##      cg19441529 cg11882358 cg08280054 cg01871025 cg00729708 cg15216357 cg04675919 cg05688478 cg06700506 cg25179313 cg11663691 cg09207137 cg04316537 cg07383357 cg00113623 cg12469381 cg07989438
##      cg20305683 cg06236987 cg18386008 cg22893969 cg11091790 cg18124907 cg19495614 cg05941375 cg19787013 cg17588704 cg25644740 cg12403148 cg03993368 cg14227325 cg03721887 cg16918438 cg24559073
##      cg19068385 cg20035294 cg08216425 cg09965404 cg16499140 cg20360416 cg04657146 cg11152253 cg13388618 cg10403109 cg11424828 cg10306780 cg15094228 cg12620265 cg18709904 cg22202169 cg00963467
##      cg07056794 cg00762003 cg16361921 cg27224751 cg05037630 cg22776211 cg16638301 cg11154719 cg21953876 cg15211500 cg17699276 cg03756044 cg04675306 cg04528326 cg00139317 cg12716696 cg21016188
##      cg00116709 cg11399582 cg27049594 cg01914365 cg15295200 cg04575501 cg16954525 cg02484732 cg01384686 cg14859874 cg00892228 cg16527629 cg19407410 cg16617830 cg16144436 cg22505202 cg00939409
##      cg09978401 cg18414950 cg16794291 cg10643429 cg08952424 cg18395382 cg15410402 cg17825572 cg11458217 cg14194326 cg26983017 cg26981746 cg17355066 cg00271873 cg24859648 cg26822438 cg02921434
##      cg17189724 cg24584738 cg19716713 cg15184869 cg10470368 cg05534333 cg23779047 cg16412513 cg08429705 cg24697097 cg20608847 cg18845598 cg27611887 cg27522357 cg02496423 cg12255092 cg20756026
##      cg00602930 cg17008556 cg01081438 cg23374711 cg25673075 cg01777565 cg07167872 cg01733439 cg06144999 cg00051154 cg18403317 cg16178271 cg13115118 cg00675157 cg04234536 cg10369879 cg23595826
##      cg09371091 cg02656016 cg01491428 cg03288751 cg00814218 cg20208613 cg06995503 cg26033510 cg15734257 cg25846190 cg23834765 cg14983172 cg16162930 cg04780373 cg12109978 cg18136963 cg15774752
##      cg07304760 cg14804181 cg25911220 cg23184276 cg20904336 cg16403901 cg14258356 cg06031234 cg23804921 cg27395310 cg03374522 cg11185978 cg18805164 cg02165546 cg26640879 cg00478198 cg08569059
##      cg10678429 cg08720028 cg01565322 cg01504940 cg00645049 cg19373347 cg05873820 cg20968048 cg15124400 cg10566479 cg06403901 cg20781383 cg25442267 cg26654770 cg16505502 cg06266461 cg12918445
##      cg12313868 cg26201401 cg06264882 cg09856996 cg13387643 cg19513111 cg15460297 cg11425580 cg22274273 cg19225953 cg14361804 cg04181991 cg13204538 cg16402757 cg17547524 cg06223162 cg23277098
##      cg23919845 cg04768387 cg20017124 cg13649400 cg19415746 cg08268047 cg14705391 cg02049865 cg14626875 cg17268094 cg01128042 cg19389973 cg22053855 cg01124926 cg11450947 cg06834235 cg07488092
##      cg03570263 cg09438069 cg17639056 cg22304519 cg05079227 cg05380919 cg27558057 cg06013788 cg22969661 cg26444086 cg25692928 cg17635970 cg00078867 cg27069132 cg17839758 cg17061760 cg25140213
##      cg03038395 cg08636328 cg08270148 cg17238522 cg20445038 cg21225796 cg05867245 cg10950297 cg14507637 cg16202259 cg06615444 cg19079513 cg03691313 cg19799454 cg03020684 cg03723481 cg01530521
##      cg23364541 cg09059153 cg18102950 cg10122885 cg03403996 cg25977965 cg02621658 cg25165144 cg26682103 cg11546683 cg14071112 cg22984586 cg19235109 cg05836189 cg17628377 cg25826070 cg26937008
##      cg22164912 cg26375010 cg18828303 cg03992069 cg23098789 cg21259115 cg15582794 cg08198851 cg25388952 cg15029183 cg24694833 cg15257930 cg19178509 cg26035071 cg26421947 cg18288715 cg06837403
##      cg11872370 cg02285579 cg04550935 cg01476442 cg12709057 cg24347720 cg26146690 cg04327763 cg13862711 cg14267065 cg04255382 cg16032134 cg24853868 cg26563651 cg05891136 cg02451693 cg05109619
##      cg23627980 cg14831665 cg07474670 cg14006678 cg04412904 cg00011200 cg23943944 cg21760862 cg10423996 cg23350716 cg12704708 cg13688687 cg01427108 cg10880252 cg21397839 cg12729177 cg23392381
##      cg18107314 cg11991151 cg19415116 cg22867893 cg02217425 cg05522042 cg12953206 cg04509103 cg17457545 cg02171833 cg11227702 cg13058551 cg13740636 cg18584561 cg27284883 cg00380985 cg14007688
##      cg02945674 cg18150287 cg26296371 cg20961245 cg17393140 cg08446187 cg00445202 cg07149083 cg26454172 cg25598710 cg08275242 cg08839358 cg12206353 cg14856563 cg16976875 cg13591052 cg07115108
##      cg25645840 cg14314132 cg04841583 cg03905487 cg00279662 cg14137558 cg19810816 cg17758652 cg15314470 cg26161652 cg24315885 cg04263740 cg10999462 cg17279365 cg12333628 cg06922212 cg10080013
##      cg04026379 cg01342901 cg22162835 cg01150227 cg01451645 cg24648384 cg08024471 cg08265308 cg05416337 cg26536949 cg09639108 cg08338641 cg09197234 cg02839725 cg23187802 cg04027004 cg10088372
##      cg11969330 cg14168080 cg10723556 cg08600378 cg11164659 cg19821612 cg14542879 cg15771339 cg19738233 cg00967012 cg08625210 cg10721440 cg02274705 cg27160885 cg22715629 cg24503407 cg08461617
##      cg10637509 cg25165659 cg06079963 cg03198009 cg16232867 cg05351360 cg14068184 cg07355270 cg17441733 cg03885028 cg05193149 cg26121752 cg19584075 cg01982279 cg22223709 cg07484678 cg04627110
##      cg08960045 cg09746326 cg07795413 cg05161773 cg20679188 cg11169344 cg18689730 cg02630646 cg12855313 cg27286614 cg22646149 cg12744907 cg14764203 cg01207755 cg05818501 cg14051366 cg03504002
##      cg25306893 cg14918074 cg14181112 cg15369199 cg05785344 cg06124141 cg23855802 cg04072009 cg01333616 cg26439324 cg08421632 cg08648877 cg03325394 cg23896353 cg17724121 cg08455905 cg14893161
##      cg13211008 cg04960964 cg20913114 cg21415084 cg02932958 cg25576048 cg05141217 cg05966078 cg14942092 cg22402121 cg02171206 cg05611160 cg24863802 cg11705504 cg13946163 cg00962106 cg22287211
##      cg14629010 cg14331362 cg20004147 cg08986950 cg10315562 cg14767338 cg00849191 cg08745107 cg12143138 cg04277055 cg24961286 cg22142142 cg06945800 cg04872051 cg19062189 cg16755189 cg11281291
##      cg01650464 cg21792493 cg06110166 cg06476934 cg12374770 cg01758122 cg05579559 cg24437580 cg06684911 cg22681945 cg21634944 cg00347850 cg11949518 cg27527657 cg08199506 cg23834181 cg26941787
##      cg16006841 cg07134368 cg01740135 cg17369140 cg24153901 cg11438287 cg16570885 cg17811452 cg11233153 cg08873063 cg05461361 cg24232370 cg13423887 cg16000638 cg01662749 cg14782559 cg02890812
##      cg17240976 cg01555661 cg16051083 cg09729660 cg11286989 cg25897349 cg15775217 cg06048169 cg27470278 cg24139837 cg12298823 cg25790212 cg06624143 cg13466755 cg05239680 cg04645024 cg11072201
##      cg08055002 cg01280698 cg09139047 cg01366378 cg22430708 cg26642774 cg10061320 cg04455999 cg16639627 cg07037055 cg06191872 cg13830619 cg03104298 cg04302178 cg11789991 cg02427933 cg01318188
##      cg13182391 cg16814680 cg14516385 cg15117507 cg17748470 cg18182981 cg15132216 cg05450979 cg03057303 cg11044162 cg21414424 cg17386473 cg22774704 cg11291009 cg22933800 cg11314779 cg21205654
##      cg07211915 cg10296238 cg21697769 cg21691076 cg25880954 cg15536552 cg13739190 cg13851368 cg12833414 cg17430903 cg09797202 cg02770249 cg26864304 cg09664314 cg07628886 cg01405303 cg04149024
##      cg09227616 cg19353052 cg09834142 cg12488572 cg05872808 cg03191359 cg25529585 cg02865277 cg00979438 cg06264089 cg00421199 cg09886258 cg17503853 cg06083932 cg07747558 cg12543766 cg15958422
##      cg07781082 cg26129200 cg25445671 cg08041448 cg07781090 cg17441804 cg15547764 cg10499451 cg00063608 cg17818432 cg09863391 cg05874912 cg12738079 cg02968327 cg07712165 cg10347326 cg10919053
##      cg13574174 cg08925606 cg18858121 cg19360212 cg12623396 cg23113041 cg18452169 cg26896756 cg20022541 cg21201934 cg19854896 cg18756931 cg19056391 cg00243527 cg15138543 cg03874513 cg11648471
##      cg12743416 cg00829575 cg09120722 cg07799180 cg09253663 cg06704717 cg01326421 cg11401796 cg11823448 cg07365741 cg08102564 cg05323542 cg08880082 cg01303569 cg16871435 cg23251359 cg23496593
##      cg17284124 cg26251192 cg03359666 cg17122979 cg27244972 cg08496601 cg16181678 cg07215528 cg08108619 cg17217478 cg02079756 cg27070288 cg27450744 cg03651054 cg01212677 cg11857805 cg16775095
##      cg16715186 cg12646252 cg26764761 cg25247689 cg22955899 cg09310980 cg13324220 cg14513804 cg06824156 cg00433220 cg21560722 cg15247483 cg01463139 cg07516457 cg02489327 cg23991947 cg11173636
##      cg05749243 cg22682304 cg25129414 cg08900396 cg14918591 cg00696044 cg06393529 cg04512759 cg21765125 cg21284493 cg16655091 cg12521790 cg09518270 cg05475474 cg23247655 cg15668967 cg13452830
##      cg08983668 cg22635523 cg25601709 cg06352538 cg12727431 cg16962612 cg25452717 cg25249362 cg08750459 cg06548479 cg18714913 cg11519740 cg21692140 cg09034259 cg19770253 cg16515238 cg05455372
##      cg07806343 cg03250346 cg25528646 cg05958126 cg08118032 cg03726259 cg04298672 cg21442271 cg07822777 cg12738248 cg26342575 cg18140045 cg06614969 cg04218584 cg19791271 cg05291429 cg05935584
##      cg22953237 cg11720358 cg17971895 cg17530337 cg06115838 cg05383619 cg06549249 cg17602481 cg13276615 cg01332299 cg09705401 cg16432908 cg04319046 cg17341969 cg09510698 cg12768975 cg06394109
##      cg25243082 cg16836675 cg03821194 cg02567750 cg03900860 cg11902811 cg12908908 cg04493908 cg07216619 cg02283535 cg17253931 cg24686902 cg12134602 cg02487331 cg26457165 cg05995465 cg17664833
##      cg01759889 cg01924074 cg00597445 cg19848641 cg00084271 cg11111131 cg08282969 cg10055097 cg16675926 cg11194545 cg04302300 cg07773740 cg02107461 cg24883219 cg19718903 cg06804873 cg01013522
##      cg00914218 cg12526470 cg16874089 cg08324927 cg13342259 cg27485646 cg02627240 cg19859323 cg13077484 cg20673830 cg09079173 cg26220594 cg24401557 cg14772068 cg01816891 cg03906572 cg26301245
##      cg15559823 cg09535760 cg09579899 cg06225639 cg12129080 cg14002365 cg01720007 cg05803370 cg04396360 cg07166908 cg07507339 cg17824401 cg02605540 cg23913313 cg05084668 cg00616572 cg16245086
##      cg10133369 cg13120260 cg04456492 cg25645693 cg21658164 cg03605032 cg08788093 cg07700317 cg07301957 cg03890843 cg20361843 cg14238671 cg05070493 cg06738063 cg26119746 cg15930598 cg08024264
##      cg23365293 cg03812172 cg02316445 cg16817435 cg06538336 cg11261447 cg03617221 cg13077366 cg03370193 cg13117792 cg05096415 cg17995340 cg21019788 cg12154943 cg05929129 cg25514427 cg21785054
##      cg04292836 cg08348649 cg17330938 cg03038914 cg07951602 cg17056069 cg26059639 cg16594779 cg24699914 cg01236565 cg02389264 cg25415674 cg10286673 cg26708920 cg07572984 cg14544514 cg03484420
##      cg02168270 cg22917366 cg15586958 cg26864661 cg19780831 cg16243644 cg00201142 cg14378789 cg19707653 cg11607219 cg02619116 cg14369777 cg13516940 cg12624040 cg01153376 cg04882216 cg01225004
##      cg05594230 cg02937794 cg18751375 cg08221357 cg21149357 cg14358839 cg10892068 cg23681001 cg23954206 cg14121685 cg09100196 cg08188907 cg24664551 cg06915321 cg17095460 cg15250633 cg09284209
##      cg15600437 cg27609342 cg12064531 cg20662859 cg27300573 cg15002478 cg07530027 cg08693140 cg08669168 cg23352245 cg03167407 cg03324099 cg05213316 cg04924408 cg05091873 cg10662047 cg12016309
##      cg22569627 cg09738386 cg05452887 cg27207144 cg13033971 cg18983709 cg16397968 cg23192736 cg22857134 cg20713636 cg23119380 cg02179438 cg20272343 cg06354054 cg00718752 cg07824128 cg04791822
##      cg26590811 cg10691647 cg12322605 cg19797013 cg12077433 cg19238394 cg22307470 cg01387905 cg04508606 cg05365121 cg22787186 cg23727079 cg26801383 cg16531277 cg11851349 cg02295504 cg00553365
##      cg18065464 cg01430241 cg17283620 cg06134910 cg11870452 cg09854620 cg21159768 cg16191297 cg05093818 cg11573182 cg11186706 cg16567137 cg24861747 cg00981879 cg04497820 cg15532640 cg15535487
##      cg01414116 cg24832428 cg22504140 cg26936989 cg02510708 cg25692732 cg00939438 cg13928473 cg07210774 cg16852920 cg05092371 cg05061041 cg25790081 cg10780707 cg10050962 cg14247154 cg27353825
##      cg19512141 cg22542451 cg02032561 cg21864829 cg15465836 cg16788857 cg16429499 cg15044932 cg16764296 cg17848104 cg10701746 cg00332268 cg15715844 cg07979524 cg12981362 cg11229715 cg25943986
##      cg01991530 cg09636905 cg27015302 cg03111560 cg19332075 cg16180556 cg10274815 cg14911689 cg06378561 cg25929399 cg17386240 cg17917970 cg18786623 cg14737574 cg11047442 cg11540596 cg20707527
##      cg04546413 cg26734875 cg17741448 cg18239511 cg22666875 cg06579087 cg13177959 cg19635884 cg04524851 cg16742675 cg09687597 cg11638117 cg12471283 cg11400068 cg06675417 cg13115455 cg06734157
##      cg00534215 cg11673013 cg20767561 cg04156077 cg11727304 cg03187614 cg08624915 cg03828160 cg13825033 cg24114730 cg04467639 cg05176970 cg16458822 cg03276920 cg15876198 cg08950364 cg26764972
##      cg20077602 cg26380710 cg23177161 cg17763566 cg14553323 cg25492195 cg08551408 cg15637874 cg16510200 cg21127593 cg13744306 cg07428182 cg24801230 cg04850148 cg00648024 cg21035907 cg20684491
##      cg24417798 cg16423096 cg09352518 cg25150572 cg02891314 cg15391239 cg12449104 cg24017974 cg22111694 cg22823009 cg02401352 cg22459517 cg20372745 cg23660678 cg26813483 cg15579650 cg23541304
##      cg18424635 cg01388693 cg14859618 cg13240932 cg06612594 cg18932722 cg04376185 cg07581973 cg25951717 cg26308359 cg09986921 cg14303457 cg17623720 cg07761942 cg06441867 cg07130381 cg18882436
##      cg10983111 cg20442191 cg22712681 cg16723510 cg21787089 cg00859877 cg21681732 cg05875700 cg14992527 cg10981178 cg00532122 cg15975960 cg26371957 cg02622647 cg05116966 cg19616372 cg01802772
##      cg14651363 cg25416774 cg17811760 cg05947181 cg00811210 cg08159412 cg26846076 cg10363118 cg10681981 cg18253743 cg01828474 cg02668233 cg09732868 cg11973981 cg01562833 cg02095003 cg24533526
##      cg03272642 cg26786615 cg11791078 cg16999994 cg11706829 cg26261358 cg17600943 cg16529483 cg02356645 cg16866567 cg16119423 cg05971102 cg26076233 cg14465143 cg24194941 cg19010939 cg12156950
##      cg00146240 cg01716666 cg11369993 cg03681484 cg22645859 cg06330797 cg17920646 cg24307368 cg22853855 cg04497611 cg15627180 cg23564471 cg09780996 cg12861974 cg24697433 cg18110333 cg06012621
##      cg17197278 cg03825574 cg26810336 cg14990368 cg06018273 cg05032903 cg14193607 cg07187289 cg18949721 cg20418101 cg04346428 cg14505657 cg12417704 cg02372404 cg07480955 cg06152434 cg23513244
##      cg11021362 cg07990395 cg05223760 cg14279361 cg13635701 cg16340188 cg23154024 cg21081239 cg08853008 cg04971651 cg11124135 cg26198148 cg16120147 cg16029533 cg04493740 cg18307604 cg16200242
##      cg22682567 cg16556401 cg04821917 cg11978593 cg23759693 cg09411587 cg06111581 cg24783624 cg14928378 cg17234414 cg26756979 cg11082424 cg10149013 cg08279515 cg15971518 cg01878430 cg12544391
##      cg12556569 cg10869581 cg04024675 cg04754076 cg01269359 cg04649587 cg11818589 cg06950937 cg00150363 cg09012881 cg12962913 cg22963378 cg16314146 cg05225083 cg02656474 cg24156746 cg05813498
##      cg06558952 cg13023205 cg26746069 cg17444608 cg01362389 cg05237503 cg16288713 cg08055597 cg14642832 cg13016553 cg13663390 cg15488542 cg20979384 cg22484503 cg15384497 cg15129815 cg23694557
##      cg08298085 cg20495737 cg08923376 cg19840763 cg27123903 cg08629394 cg01003197 cg18707028 cg03485872 cg06068545 cg01186276 cg01857253 cg18827179 cg20416767 cg06796204 cg07309821 cg21966453
##      cg00272795 cg00173771 cg01176653 cg09255886 cg22931151 cg16734817 cg01809408 cg11173002 cg10388998 cg21557668 cg10463108 cg15290312 cg11601920 cg14532717 cg13226272 cg17058724 cg20498086
##      cg16343465 cg09737095 cg22850802 cg04622888 cg04924736 cg23974730 cg18399551 cg19268168 cg13822691 cg16247409 cg20670088 cg27112414 cg02691623 cg22151131 cg09019154 cg00231519 cg26007606
##      cg20641280 cg06334689 cg13655986 cg05207724 cg17831869 cg00028022 cg06190612 cg04282082 cg02834750 cg01771673 cg12073886 cg27483305 cg05305760 cg18989810 cg05192017 cg09790289 cg02682989
##      cg00018245 cg17223593 cg08967584 cg06489418 cg03628603 cg16645815 cg00917018 cg12058262 cg24780352 cg22897878 cg04533591 cg07284314 cg04734394 cg14380863 cg11337025 cg15396877 cg25366315
##      cg24536689 cg15463454 cg23503517 cg03803585 cg26215003 cg15748104 cg06573709 cg01406203 cg00139033 cg03705894 cg03071582 cg02637222 cg13009425 cg26739327 cg13042103 cg15324534 cg01471923
##      cg04003990 cg08973646 cg27513289 cg16474696 cg24088508 cg03876418 cg17571782 cg10848980 cg08253809 cg08584917 cg06394820 cg07223206 cg17131279 cg07870920 cg14460470 cg26811602 cg25933726
##      cg04586456 cg10201390 cg01267392 cg03940883 cg25363292 cg17052675 cg27292235 cg23161429 cg16466260 cg25858008 cg16761754 cg07138269 cg17835180 cg06048710 cg07584620 cg05803913 cg01943931
##      cg16490805 cg24585803 cg04740028 cg01891172 cg13741548 cg13766816 cg26081710 cg01234346 cg02351219 cg00587941 cg08841290 cg15050093 cg10102167 cg02217565 cg07867687 cg20033591 cg17356557
##      cg12472218 cg09557047 cg13603288 cg12213037 cg13080267 cg18091964 cg05679079 cg17195879 cg16318983 cg26323797 cg08147831 cg02553463 cg16626088 cg01873087 cg15199886 cg04613734 cg04402486
##      cg12443477 cg17294479 cg09720515 cg07223177 cg18897294 cg02056550 cg25758034 cg03487706 cg18112782 cg02460443 cg10092377 cg26930498 cg09549987 cg22112152 cg19301366 cg17788031 cg22946888
##      cg16569309 cg13110951 cg23658987 cg10601372 cg00324979 cg14167033 cg00603890 cg15019001 cg03363289 cg05006879 cg00819121 cg25938960 cg23923019 cg14453947 cg24676461 cg00458505 cg10091792
##      cg21028319 cg11062466 cg25123033 cg16373938 cg23299576 cg06762527 cg10974412 cg23722438 cg08784874 cg25259265 cg22449896 cg17224287 cg24872173 cg10387963 cg25291037 cg12464216 cg23718917
##      cg05899999 cg21507367 cg14924512 cg08697944 cg16779438 cg07525313 cg02042142 cg14710850 cg02355809 cg07924176 cg00923506 cg08695223 cg20792978 cg16046605 cg12293347 cg14520892 cg26751304
##      cg13117487 cg14621900 cg12031275 cg05441864 cg06118351 cg00236261 cg11019791 cg02773151 cg10144558 cg16197788 cg06701726 cg12614178 cg12166502 cg22191603 cg11734718 cg17025908 cg03202526
##      cg23543766 cg02096220 cg17477997 cg12308308 cg20074774 cg19180514 cg23121114 cg19192106 cg23248424 cg06237805 cg02148711 cg23892645 cg24239165 cg05732866 cg06174194 cg01910713 cg07684647
##      cg06979386 cg11664825 cg18845053 cg01981678 cg21812850 cg07936689 cg17503814 cg07021532 cg27381383 cg23832225 cg16783314 cg09993718 cg19415339 cg01074083 cg01345087 cg25766534 cg17628491
##      cg11610546 cg08605899 cg08332163 cg13683939 cg04925407 cg07796016 cg27083627 cg24184541 cg13736842 cg23828566 cg04636193 cg09175792 cg10975354 cg09937487 cg23056271 cg07480176 cg03237181
##      cg01556010 cg15654485 cg14457850 cg09309537 cg16791832 cg07034012 cg01191347 cg01600123 cg00157199 cg07973125 cg21598489 cg17758583 cg22535849 cg02128221 cg11190082 cg21757617 cg10047502
##      cg01080862 cg05786381 cg13571460 cg03395511 cg08292959 cg08857872 cg02788935 cg13213853 cg19864342 cg20142682 cg16059374 cg06482328 cg13604933 cg12040425 cg09231482 cg14615905 cg11164993
##      cg21501207 cg16098618 cg24849373 cg11383134 cg06185870 cg17624691 cg21635596 cg09624684 cg14533874 cg17302834 cg10827006 cg11849573 cg17849117 cg11056057 cg16746961 cg09827761 cg08764590
##      cg20678988 cg06875704 cg11896151 cg16431720 cg00443543 cg01080689 cg24202706 cg17049318 cg14661652 cg19692784 cg13353338 cg15128295 cg27376941 cg19619253 cg05565442 cg11075353 cg03997626
##      cg25874079 cg19055639 cg01756638 cg00704384 cg13645300 cg24639370 cg03711046 cg04828704 cg17149359 cg07816287 cg01097733 cg21578644 cg18621672 cg26720147 cg21365235 cg11520843 cg09551793
##      cg26170855 cg02887598 cg07066496 cg14556695 cg15852849 cg24873924 cg06634367 cg13717933 cg09906928 cg16858433 cg05930514 cg24536782 cg24925526 cg16345566 cg14007036 cg10547329 cg23881939
##      cg17300120 cg16771702 cg00179446 cg07134666 cg19648783 cg14474963 cg15802548 cg16940942 cg18861767 cg01021334 cg12865398 cg23690264 cg23899408 cg21173378 cg09309899 cg12702014 cg04769268
##      cg12536802 cg09282866 cg24769381 cg16066280 cg10831113 cg01921484 cg10266977 cg16840743 cg13066461 cg00415024 cg12814117 cg15070894 cg25963939 cg16338321 cg04069374 cg25655482 cg17376045
##      cg25227163 cg05397816 cg14044167 cg05511752 cg05578102 cg14408831 cg02637608 cg12067202 cg12776173 cg02772880 cg19270931 cg02523105 cg00779763 cg14004892 cg22823767 cg20793193 cg06539076
##      cg15914672 cg00823357 cg24035229 cg15421338 cg07466166 cg13725394 cg26647036 cg10486069 cg17544225 cg04712670 cg18029737 cg17005162 cg05127178 cg12419462 cg27047283 cg02761835 cg13821051
##      cg26744454 cg19596870 cg20073472 cg17290099 cg07758529 cg26927606 cg14652218 cg21999614 cg15677681 cg11012412 cg02643260 cg05045738 cg06508795 cg04665311 cg03273069 cg05994819 cg13510054
##      cg05049545 cg17661798 cg18873965 cg25896944 cg00247094 cg26128129 cg12603865 cg26985119 cg07553030 cg06546677 cg23066860 cg27616996 cg05204037 cg00424152 cg07649160 cg20565764 cg25237542
##      cg20053110 cg14368286 cg11394125 cg00268443 cg06681098 cg06769752 cg02502145 cg24920613 cg06362895 cg11615510 cg15782903 cg17231740 cg03738707 cg18799866 cg21495366 cg19977566 cg19712277
##      cg13606395 cg13920856 cg18843803 cg06158088 cg20282533 cg16653991 cg23969005 cg23990273 cg17135929 cg21046413 cg14383905 cg12306781 cg02732801 cg19662677 cg12058840 cg16617243 cg10581449
##      cg12145026 cg13361506 cg17360125 cg05592647 cg19867709 cg18261043 cg09449747 cg25712921 cg13122347 cg13532816 cg05185684 cg23893060 cg22876425 cg16303048 cg06699489 cg15591384 cg24770638
##      cg13089583 cg05001007 cg27198824 cg19555075 cg16348674 cg17937061 cg19618634 cg00575851 cg04610742 cg07965995 cg25870731 cg10590338 cg14582632 cg15858894 cg23873200 cg06323885 cg09663736
##      cg01130884 cg08354527 cg10536534 cg06856169 cg05057827 cg12829224 cg10235569 cg19223824 cg07771796 cg00151234 cg02447542 cg08373250 cg02647401 cg15041487 cg06341336 cg17770035 cg03084184
##      cg17329602 cg00191052 cg00409000 cg27219185 cg04667775 cg20116159 cg03685263 cg06520293 cg04094482 cg11107468 cg13800652 cg22604777 cg05788681 cg24605338 cg07950491 cg21829038 cg10773971
##      cg04124201 cg26465155 cg20504202 cg12858518 cg00016522 cg09186478 cg20664795 cg24078577 cg26845351 cg18156601 cg02219431 cg26505619 cg21533482 cg09146088 cg09084933 cg11901680 cg13482134
##      cg23698271 cg00356834 cg22337407 cg20920357 cg20161548 cg06917976 cg05588352 cg03024957 cg20299670 cg16617551 cg01549082 cg00695177 cg02645586 cg20951645 cg17558827 cg06855731 cg03938978
##      cg24407607 cg18309183 cg06390387 cg04829448 cg21209948 cg14539516 cg23726559 cg24318558 cg10553219 cg18507125 cg16044575 cg05194426 cg07768177 cg18632412 cg23053339 cg09232555 cg17227030
##      cg20291091 cg20517941 cg13301327 cg05320460 cg20549400 cg25375329 cg13052474 cg10993865 cg08104711 cg11764747 cg18203203 cg20514061 cg22334681 cg26948066 cg19586483 cg02614045 cg12560103
##      cg09516963 cg17207724 cg25640065 cg17691521 cg06302025 cg21594961 cg08785133 cg14388993 cg05554396 cg07846874 cg07523188 cg03854098 cg00502469 cg23356769 cg12942133 cg05849149 cg02177141
##      cg22332157 cg26474732 cg22275278 cg09015880 cg00373606 cg16145609 cg14479037 cg00378473 cg19996396 cg21233518 cg06406458 cg01565803 cg17187785 cg01097406 cg09868337 cg14834143 cg08587717
##      cg17369694 cg27107292 cg12870217 cg21329160 cg03372334 cg17876294 cg12137402 cg18132745 cg06520095 cg16854281 cg21890239 cg16751734 cg06624970 cg16151959 cg05424879 cg06295352 cg14773367
##      cg24263233 cg24923543 cg05741563 cg26237810 cg20859841 cg25153266 cg22561883 cg15315638 cg24925741 cg23084506 cg09627057 cg18079296 cg20336016 cg13589109 cg18172516 cg27465531 cg03209009
##      cg11133939 cg01533115 cg01275521 cg08996597 cg06526366 cg25340688 cg17487799 cg21606887 cg15700429 cg04811556 cg15032304 cg22010900 cg04579183 cg23417743 cg15514311 cg05678960 cg06489993
##      cg12517167 cg02995878 cg01620164 cg01886630 cg09121569 cg19528502 cg11016420 cg05601623 cg11127281 cg24971873 cg25549819 cg04177426 cg22280068 cg23633026 cg25277809 cg12421087 cg14513624
##      cg16076396 cg24634455 cg15600226 cg13984511 cg07903626 cg01287393 cg00910720 cg14928964 cg05103573 cg07568443 cg18444757 cg13243544 cg16254485 cg00691878 cg22213420 cg16515462 cg04332058
##      cg13081620 cg03359067 cg04070122 cg14625604 cg13646297 cg12171675 cg01293417 cg02225060 cg06427226 cg26381742 cg00042902 cg04608203 cg13452812 cg20681688 cg02291897 cg03855052 cg16371598
##      cg25169289 cg14548901 cg12894338 cg24139739 cg04613354 cg26845082 cg03683899 cg26950400 cg00512739 cg03516131 cg12543338 cg27567711 cg09014801 cg03537243 cg17010160 cg26542892 cg10138910
##      cg25893052 cg11222557 cg02871887 cg10146330 cg01601188 cg19949137 cg04417708 cg24597825 cg20436028 cg10978526 cg02891306 cg14091713 cg09501102 cg02265440 cg20096820 cg13307142 cg12171481
##      cg21575308 cg19741073 cg01979298 cg17040469 cg27309655 cg23914255 cg17830140 cg03034673 cg17390076 cg05522180 cg09970125 cg02748858 cg04109990 cg19928247 cg06620254 cg21718113 cg21702557
##      cg19182683 cg10510586 cg02764183 cg02459704 cg13368637 cg21988739 cg06407043 cg06202802 cg11109139 cg17225604 cg07506153 cg19039586 cg27519679 cg10130209 cg19572135 cg15785681 cg02288667
##      cg09045339 cg08891829 cg05137263 cg16778274 cg15755924 cg26540943 cg07414487 cg00538654 cg19392200 cg00636942 cg08408091 cg01629746 cg05379196 cg25960393 cg24430871 cg05179499 cg01777017
##      cg27296089 cg22252245 cg25852925 cg21543270 cg19586382 cg26584772 cg09101062 cg16744531 cg11186981 cg13117582 cg00202029 cg02937293 cg04737881 cg04787237 cg26857803 cg07731488 cg07135512
##      cg17611936 cg27112983 cg03876548 cg09879385 cg12279734 cg13828067 cg13835168 cg13438027 cg08604228 cg17398485 cg23066280 cg13143872 cg00623388 cg00440468 cg08656747 cg03372815 cg20056133
##      cg15274662 cg22575892 cg01353788 cg13291896 cg12217778 cg06880438 cg03883761 cg00853216 cg00927228 cg08133365 cg20070588 cg09850561 cg08060988 cg10950266 cg02553872 cg06077978 cg24795173
##      cg13324406 cg17068766 cg22522913 cg21380015 cg05889642 cg10662395 cg09616536 cg03732411 cg04982938 cg14741147 cg03695421 cg17370616 cg01491962 cg00988678 cg10632894 cg07525100 cg11720861
##      cg13295089 cg03389273 cg00622384 cg23493872 cg08586441 cg18353405 cg01270299 cg00763725 cg20394620 cg16501779 cg05260852 cg06228453 cg25569462 cg05135828 cg01479413 cg20094343 cg23408987
##      cg07664579 cg03316763 cg17658113 cg06936709 cg05996419 cg13338084 cg14057434 cg08476485 cg12586718 cg13431666 cg09018810 cg19732480 cg12108278 cg00328965 cg11322703 cg10666341 cg11379315
##      cg12377327 cg26155681 cg11304899 cg15233060 cg07038409 cg05147108 cg17602500 cg17935021 cg25709790 cg24112882 cg01579322 cg11717930 cg04727236 cg10240127 cg22971550 cg19548593 cg19938535
##      cg21724239 cg12307373 cg12338576 cg23432430 cg03810282 cg01206544 cg16247826 cg24540763 cg02756511 cg11165479 cg25124605 cg18492813 cg04311686 cg20036791 cg03047376 cg05740522 cg10054049
##      cg20450662 cg26333595 cg15465743 cg08291129 cg02439761 cg20746035 cg15661671 cg18030003 cg09182085 cg10674704 cg16746159 cg00604086 cg15906052 cg16652920 cg06112204 cg23431897 cg15883690
##      cg11517058 cg09087020 cg22134140 cg15552475 cg12228670 cg15308688 cg25116412 cg02478793 cg13215060 cg20218571 cg19792266 cg21905818 cg24576051 cg13081526 cg19636627 cg21331369 cg24831179
##      cg11869614 cg01780361 cg06330705 cg05848894 cg08095377 cg20958732 cg26105341 cg03188948 cg22206855 cg06783548 cg10309386 cg19021236 cg03028216 cg16302040 cg15399577 cg26641676 cg24522768
##      cg15198148 cg06201514 cg19503462 cg03754882 cg05835545 cg16984885 cg12368522 cg01095775 cg11170744 cg17776579 cg18885073 cg13957186 cg00124902 cg02547323 cg10471638 cg23885472 cg07028768
##      cg18636716 cg04012614 cg11863130 cg03798428 cg26853071 cg20291244 cg13851901 cg16431713 cg19194595 cg11049774 cg24969716 cg10495496 cg26842946 cg24100841 cg03282748 cg06277607 cg05086798
##      cg08370748 cg02630070 cg10942914 cg15099537 cg08087512 cg26116556 cg06568768 cg03436348 cg11787167 cg25184502 cg03192186 cg07104639 cg17296678 cg08537289 cg00513811 cg14240646 cg06960717
##      cg02061213 cg18859776 cg15292356 cg01461762 cg09526164 cg07112541 cg01398912 cg08965565 cg01483656 cg09867302 cg21940586 cg19525496 cg15279541 cg00086247 cg04394104 cg03827304 cg04334684
##      cg00145055 cg03964373 cg04175473 cg10062460 cg12501871 cg16726201 cg08684066 cg04210573 cg14260918 cg14470409 cg00993140 cg00079479 cg06631775 cg19761957 cg05138546 cg08242313 cg26157279
##      cg04344997 cg14794494 cg01234546 cg26359388 cg01282008 cg03494844 cg25968748 cg13663706 cg00291896 cg25987936 cg06316104 cg05779406 cg26784310 cg16998810 cg09584650 cg04589021 cg05707815
##      cg18105134 cg08269402 cg01023242 cg17330935 cg07180121 cg23009831 cg05534247 cg07223663 cg26889118 cg18537979 cg23663942 cg04029664 cg25881119 cg24440720 cg13680016 cg07792871 cg22653957
##      cg24596064 cg27272246 cg03671371 cg00023507 cg23221052 cg14137819 cg20340631 cg17592148 cg21934265 cg12080266 cg10956264 cg08860608 cg04043455 cg06675660 cg06129455 cg00727483 cg10738049
##      cg14824933 cg00501169 cg14883135 cg17330459 cg03292225 cg16762854 cg20081453 cg20673834 cg21456313 cg23657215 cg15059639 cg11141652 cg14106263 cg17316649 cg07738664 cg00904184 cg11821302
##      cg17455208 cg08208480 cg01778345 cg09307518 cg09428674 cg00632374 cg23699809 cg06559318 cg02218418 cg16158407 cg10921219 cg12689021 cg27076487 cg06908232 cg07671395 cg20945531 cg26264314
##      cg06791102 cg05126514 cg20057198 cg19561561 cg14232291 cg19605909 cg14748151 cg03894002 cg09418035 cg04447612 cg08112003 cg01948589 cg15511490 cg22866218 cg18049750 cg10051493 cg04856605
##      cg17314580 cg21986118 cg15963542 cg14422932 cg18016370 cg17271308 cg20208879 cg22786333 cg16749614 cg25009553 cg22741595 cg12925689 cg11734017 cg22325292 cg19097407 cg06926874 cg26506212
##      cg19455396 cg22366618 cg20107632 cg12581298 cg25078813 cg19637330 cg01062020 cg15146462 cg20971536 cg26010110 cg20356878 cg16204757 cg00240113 cg07062522 cg01053087 cg22827938 cg16892087
##      cg22436195 cg04144603 cg05551578 cg13069450 cg00409995 cg26052728 cg15373744 cg17858192 cg04012354 cg11733135 cg05130312 cg15574437 cg10331657 cg15084585 cg17571554 cg04234424 cg06767339
##      cg25943481 cg07679432 cg03769817 cg19885057 cg17044529 cg13011003 cg13128531 cg17821453 cg20057595 cg02108367 cg19364311 cg14060113 cg03924089 cg20442640 cg11070274 cg07999547 cg04544498
##      cg24422984 cg14210943 cg14904299 cg05850457 cg07971231 cg22190077 cg00191758 cg02129532 cg00122614 cg03393996 cg05406088 cg04664583 cg04642489 cg27505047 cg11290949 cg08952867 cg16787259
##      cg26021304 cg01584086 cg19063343 cg27515272 cg03157806 cg02456261 cg24826745 cg26757229 cg24207904 cg07144177 cg05053752 cg10863737 cg21035183 cg11816734 cg25548414 cg02318959 cg07929412
##      cg19632603 cg13024202 cg24478129 cg20459037 cg24529533 cg09216282 cg19156046 cg27424370 cg01591343 cg13771629 cg10101479 cg04532989 cg22713892 cg01996567 cg00072288 cg06780766 cg04787784
##      cg03982462 cg01086868 cg00769799 cg25958450 cg08880703 cg25939766 cg14453693 cg05064044 cg05370638 cg01445307 cg08145292 cg02805922 cg19860691 cg07863318 cg25228188 cg17281658 cg07363416
##      cg06715136 cg18426655 cg15880010 cg03441493 cg27596172 cg20803293 cg15967253 cg03615426 cg00883449 cg17345036 cg09610569 cg14649234 cg16512708 cg16586288 cg04132418 cg09233619 cg18137779
##      cg14001750 cg18913020 cg17101431 cg15501526 cg09523186 cg13517032 cg14417873 cg11657520 cg06139856 cg06776257 cg07931809 cg14514383 cg23333490 cg09092713 cg01436125 cg06634228 cg09407917
##      cg07553115 cg18705301 cg08239458 cg08560117 cg12532878 cg18742441 cg16108684 cg04248279 cg27653615 cg16538737 cg12280242 cg22817042 cg08991015 cg08395784 cg12952561 cg03398989 cg10149949
##      cg06833284 cg02573613 cg07294114 cg26690318 cg07920034 cg01461235 cg26348696 cg03161453 cg08762424 cg22674664 cg14868374 cg27149093 cg12449246 cg04528545 cg09504384 cg01647917 cg14938272
##      cg12623328 cg05904344 cg16814362 cg15170964 cg15144825 cg01306265 cg18769303 cg13641645 cg05294909 cg24419602 cg18677034 cg06438901 cg19648023 cg08965337 cg07854829 cg15797015 cg08434396
##      cg01680303 cg27623975 cg00098609 cg18874882 cg16571124 cg18367269 cg21299256 cg07158503 cg01324343 cg21903569 cg26246138 cg09406238 cg12325071 cg23212799 cg25094735 cg05230013 cg14257071
##      cg05837905 cg12928933 cg06371647 cg16885113 cg17671604 cg16293892 cg17370981 cg22527791 cg17527589 cg06536614 cg27466466 cg16101574 cg03983969 cg15530374 cg22404117 cg16344140 cg09734113
##      cg06504563 cg14175932 cg17976473 cg26219488 cg25125652 cg05538645 cg10744664 cg05829338 cg18818432 cg13549303 cg03979311 cg21537736 cg13692482 cg12196389 cg15480941 cg24401199 cg25402895
##      cg11049634 cg03444934 cg08579577 cg13431688 cg02973971 cg15730644 cg00775115 cg11329209 cg25052176 cg18819889 cg08087911 cg00929286 cg12074688 cg07774765 cg03435878 cg08437835 cg08090704
##      cg24157854 cg04033559 cg12534199 cg23157067 cg05570109 cg02981548 cg25714218 cg16145139 cg08871399 cg15953767 cg15229836 cg21397157 cg24104387 cg08894652 cg01688609 cg09197443 cg03447554
##      cg20389709 cg07356745 cg14295009 cg08963265 cg26614134 cg07987169 cg03403606 cg01521131 cg12291192 cg25751474 cg04836978 cg06172626 cg01512466 cg05040568 cg19428430 cg11062691 cg18566479
##      cg25208881 cg25220992 cg09022442 cg08861434 cg16637019 cg16488059 cg16890879 cg10528537 cg10978884 cg01108554 cg03460558 cg14502545 cg19162841 cg11789449 cg04718469 cg22802014 cg06590444
##      cg10861555 cg11663393 cg15393936 cg12562822 cg11631601 cg21040569 cg10361788 cg16033420 cg21113886 cg23945350 cg22651103 cg09300348 cg02264182 cg23124867 cg00689685 cg24433124 cg15458936
##      cg13183496 cg20790618 cg11271681 cg24721916 cg21367586 cg15815114 cg07052950 cg13033283 cg09045429 cg07495405 cg09598844 cg23390865 cg15145296 cg06773563 cg13571540 cg17002719 cg08791697
##      cg04183498 cg17429539 cg15706568 cg23216745 cg08707656 cg18665209 cg23610414 cg02872767 cg08506353 cg14541448 cg11466708 cg12454975 cg23676314 cg19166406 cg06614118 cg05401945 cg17726767
##      cg21911363 cg17352172 cg13941987 cg04086176 cg01561758 cg19549023 cg24246628 cg06479769 cg00264378 cg26167301 cg06293983 cg08554146 cg11358878 cg10815671 cg06380355 cg11405538 cg09756125
##      cg12252547 cg01760119 cg00322003 cg20611115 cg00274965 cg21393171 cg14170504 cg07504457 cg07195891 cg04958669 cg12819537 cg05004142 cg16227684 cg26359238 cg01098871 cg07189587 cg27649396
##      cg13208102 cg05799088 cg13967339 cg21533743 cg23947654 cg02696431 cg03040423 cg18526121 cg02464808 cg12230162 cg01079126 cg06392956 cg03276257 cg05123276 cg13860040 cg02624484 cg13806096
##      cg16712416 cg01821994 cg12994227 cg04217234 cg18396806 cg05896524 cg04963199 cg14107346 cg25496792 cg17925226 cg05008355 cg10655551 cg11247378 cg26011713 cg08300622 cg21158163 cg07890587
##      cg03115532 cg16829530 cg01757283 cg24546622 cg26450896 cg08774483 cg07152869 cg08118341 cg26161745 cg10016783 cg01045938 cg00618396 cg23534593 cg05155812 cg16680922 cg15765638 cg23750556
##      cg26745032 cg03787837 cg05876883 cg13273653 cg12039967 cg04011368 cg12232167 cg01288367 cg22466012 cg23517115 cg00357087 cg07754385 cg06825163 cg15411272 cg25258879 cg00102779 cg15308487
##      cg13121076 cg20398163 cg08395108 cg01730928 cg11253356 cg13405878 cg10841563 cg04459447 cg03672288 cg02589074 cg18816397 cg14016568 cg23934598 cg00690049 cg06493824 cg01121830 cg03531388
##      cg01997599 cg11906884 cg14687298 cg20033444 cg11867195 cg00638631 cg08433110 cg03317455 cg10064060 cg10796603 cg07342016 cg18834878 cg05870108 cg18780401 cg04229575 cg14627380 cg02167895
##      cg20124410 cg07889003 cg02720188 cg12400864 cg10864200 cg10294367 cg00475988 cg15260248 cg00216997 cg22713460 cg05519582 cg14378311 cg00722631 cg16692973 cg00154902 cg20201388 cg05280794
##      cg04819054 cg08063850 cg10310824 cg14527649 cg24100293 cg22014878 cg06042886 cg13788004 cg06249604 cg15098922 cg16074990 cg19902553 cg12030973 cg23830335 cg04612030 cg02895192 cg01560464
##      cg02129889 cg02611122 cg19242610 cg04746792 cg01573782 cg10146442 cg19561503 cg22341646 cg00993903 cg20855303 cg10698654 cg15985500 cg01203766 cg06919398 cg17676246 cg02157475 cg19523014
##      cg21489390 cg08800033 cg05180335 cg15014034 cg18434912 cg12842316 cg26901661 cg05471616 cg00123214 cg13241003 cg01000789 cg14841098 cg05497266 cg09102275 cg17118775 cg01132407 cg19137564
##      cg27502296 cg10039445 cg22443212 cg08299859 cg15338502 cg17977362 cg05355592 cg24906015 cg03023189 cg13498757 cg14188106 cg03392100 cg08167410 cg20517444 cg00004073 cg15185382 cg07634717
##      cg06870118 cg20769264 cg17203703 cg21882403 cg08320316 cg15570656 cg11857742 cg17731486 cg06818142 cg27187671 cg27452255 cg01244877 cg09354738 cg13918312 cg13871921 cg08065501 cg11409998
##      cg17920241 cg24432675 cg01077499 cg13726724 cg24389217 cg09112156 cg10507819 cg07910525 cg03129555 cg11936536 cg13289413 cg01079515 cg13375589 cg05279330 cg17337021 cg21392220 cg23423607
##      cg08627233 cg26827441 cg07157030 cg09202373 cg21610927 cg14683071 cg07889355 cg04152793 cg18850127 cg16914890 cg11403739 cg13943068 cg14271106 cg18522231 cg06697310 cg02386310 cg21491240
##      cg16091746 cg11371160 cg02631626 cg07883385 cg19163396 cg04593869 cg25685741 cg02179473 cg10896204 cg14959801 cg13666323 cg00114913 cg00209850 cg02188665 cg14089267 cg17129965 cg13014982
##      cg07147204 cg03576039 cg15841434 cg12906381 cg06231502 cg03908647 cg00258480 cg03751162 cg22012543 cg09454992 cg02714492 cg16641190 cg01615050 cg10195365 cg03185552 cg26815962 cg13192155
##      cg04547000 cg07011790 cg05491276 cg12063064 cg20088245 cg27286863 cg03126799 cg07869343 cg14029629 cg01078706 cg04665139 cg13459056 cg16361249 cg01741056 cg07335343 cg22951728 cg03558222
##      cg25545878 cg15027606 cg19344820 cg08846002 cg11479389 cg07314988 cg20507276 cg07819186 cg20271609 cg01947226 cg04506342 cg13250819 cg26688923 cg11109182 cg14623940 cg06124711 cg12220696
##      cg02233190 cg09727210 cg19445191 cg11890740 cg10759817 cg04832504 cg12527112 cg16718054 cg24323552 cg14421879 cg13208429 cg00343163 cg05865327 cg22660483 cg14961598 cg26995506 cg07848310
##      cg04821091 cg08680085 cg22805356 cg03830006 cg04481923 cg24978383 cg26786476 cg00977253 cg18918831 cg17345373 cg00142683 cg08108858 cg00276426 cg05375065 cg05230942 cg18333092 cg19455028
##      cg13823439 cg11976736 cg23897894 cg03377939 cg13311758 cg12074916 cg26966808 cg09985072 cg23112505 cg00321709 cg21243064 cg03056849 cg26919182 cg10530344 cg24680791 cg13781574 cg15831967
##      cg01329836 cg06506080 cg15991630 cg07747666 cg02161125 cg25547772 cg27577781 cg26511598 cg02092632 cg11024449 cg17601621 cg10026671 cg12381370 cg10479459 cg00545801 cg14343687 cg03958058
##      cg04970287 cg18005219 cg02649608 cg25873514 cg04512413 cg16061810 cg25873934 cg20685672 cg19022384 cg04980389 cg20344426 cg03347749 cg08638354 cg01043588 cg00045034 cg00901687 cg10044179
##      cg05013890 cg14098973 cg11886220 cg04568643 cg22662205 cg14522327 cg13723766 cg02252289 cg20731167 cg19579913 cg07044115 cg05812143 cg07904290 cg11016745 cg21693321 cg12268565 cg24964466
##      cg07478795 cg20486825 cg03660162 cg09799350 cg05134041 cg04675184 cg15001930 cg13655660 cg27399895 cg08318837 cg19130458 cg00167913 cg09855112 cg13966843 cg04814085 cg17042243 cg12744031
##      cg22620746 cg12186981 cg01683788 cg07469075 cg08816901 cg08955276 cg01191806 cg14871604 cg05377703 cg23595710 cg04238548 cg11727174 cg16394734 cg12105899 cg18834050 cg14317384 cg22625945
##      cg02634916 cg01410230 barcodes RID.a prop.B prop.NK prop.CD4T prop.CD8T prop.Mono prop.Neutro prop.Eosino DX age.now PTGENDER ABETA TAU PTAU PC1 PC2 PC3 ageGroup ageGroupsq DX_num uniqueID
##      Horvath
##  [ reached 'max' / getOption("max.print") -- omitted 6 rows ]
## [1]  315 5023
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 4){
  pheno_data_m4 <- df_picked_m4[,phenotic_features_m4] 
  print(head(pheno_data_m4[,1:5],n=3))

  design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)

  colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
  colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"

  print(head(design_m4))

  beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))

}
##                           DX  age.now PTGENDER          PC1         PC2
## 200223270003_R03C01       CN 78.60000   Female -0.172761185  0.05745834
## 200223270003_R06C01       CN 80.40000   Female -0.003667305  0.08372861
## 200223270003_R07C01 Dementia 78.16441     Male -0.186779607 -0.01117250
##                     CN Dementia  age.now PTGENDERMale          PC1         PC2          PC3
## 200223270003_R03C01  1        0 78.60000            0 -0.172761185  0.05745834  0.005055871
## 200223270003_R06C01  1        0 80.40000            0 -0.003667305  0.08372861  0.029143653
## 200223270003_R07C01  0        1 78.16441            1 -0.186779607 -0.01117250 -0.032302430
## 200223270006_R04C01  1        0 80.67796            0 -0.037862929  0.01571950 -0.008685676
## 200223270007_R04C01  0        1 71.50000            0 -0.138852272  0.02990245 -0.031733844
## 200223270008_R05C01  1        0 83.48289            0 -0.212852602  0.05179824  0.018645401

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 4){

  fit_m4 <- lmFit(beta_values_m4, design_m4)
  head(fit_m4$coefficients)


  contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
 
  fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  decideTests(fit2_m4)
}
## TestResults matrix
##             Contrasts
##              Dementia - CN
##   cg08223187             0
##   cg15794987             0
##   cg04821830             0
##   cg24629711             0
##   cg17380855             0
## 4995 more rows ...
if(METHOD_FEATURE_FLAG == 4){
  dmp_results_m4_try1 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m4_try1)

}
## dmp_results_m4_try1
##    0 
## 5000

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 4){
  # Identify DMPs, we will use this one:
  dmp_results_m4 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m4)
}
## dmp_results_m4
##   -1    0    1 
##  187 4607  206
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 4){

  significant_dmp_filter <- dmp_results_m4 != 0 
  significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
  df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]

  dim(df_picked_m4)
}
## [1] 315 399
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 4){
  full_results_m4 <- topTable(fit2_m4, number=Inf)
  full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
  head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  sorted_full_results_m4 <- full_results_m4[
    order(full_results_m4$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  library(ggplot2)
  ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 4){
  ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}

if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 4){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m4) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m4)

  processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
  processed_data_m4_df <- as.data.frame(processed_data_m4)
  rownames(processed_data_m4_df) <- rownames(df_picked_m4)
  print(dim(processed_data_m4))
}
## [1] 315 283
if(METHOD_FEATURE_FLAG == 4){
  AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
  print(length(AfterProcess_FeatureName_m4))
  head(AfterProcess_FeatureName_m4)
  tail(AfterProcess_FeatureName_m4)
}
## [1] 283
## [1] "cg20507276" "cg00977253" "cg27577781" "cg04970287" "cg05377703" "DX"
if(METHOD_FEATURE_FLAG == 4){
  levels(df_picked_m4$DX)
}
## [1] "CN"       "Dementia"
if(METHOD_FEATURE_FLAG == 4){
  lastColumn_NUM_m4<-dim(processed_data_m4)[2]
  last5Column_NUM_m4<-lastColumn_NUM_m4-5
  head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
  print(levels(processed_data_m4$DX))
  print(dim(processed_data_m4))
}
## [1] "CN"       "Dementia"
## [1] 315 283

(5) Method Five - CN vs MCI

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 5){
  
  df_fs_method5<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 5){
  phenotic_features_m5<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
  df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]

  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
  head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
  dim(df_picked_m5)
}
Filter and Change to Classification with ‘CN vs MCI’
if(METHOD_FEATURE_FLAG == 5){
  df_picked_m5<-df_picked_m5 %>%  filter(DX != "Dementia") %>% droplevels()

  
  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)

  head(df_picked_m5[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 5){
  print(dim(df_picked_m5))
  print(table(df_picked_m5$DX))
}
if(METHOD_FEATURE_FLAG == 5){
  df_fs_method5 <- df_fs_method5 %>%  filter(DX != "Dementia") %>% droplevels()
  df_fs_method5$DX<-as.factor(df_fs_method5$DX)
  print(head(df_fs_method5))
  print(dim(df_fs_method5))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 5){
  pheno_data_m5 <- df_picked_m5[,phenotic_features_m5] 
  print(head(pheno_data_m5[,1:5],n=3))

  design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)

  colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
  colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"

  print(head(design_m5))

  beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.

if(METHOD_FEATURE_FLAG == 5){

  fit_m5 <- lmFit(beta_values_m5, design_m5)
  head(fit_m5$coefficients)


  contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
 
  fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  decideTests(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  dmp_results_m5_try1 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m5_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 5){
  # Identify DMPs, we will use this one:
  dmp_results_m5 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m5)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 5){

  significant_dmp_filter <- dmp_results_m5 != 0 
  significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
  df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]

  dim(df_picked_m5)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 5){
  full_results_m5 <- topTable(fit2_m5, number=Inf)
  full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
  head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  sorted_full_results_m5 <- full_results_m5[
    order(full_results_m5$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  library(ggplot2)
  ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 5){
  ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 5){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m5) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m5)

  processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
  processed_data_m5_df <- as.data.frame(processed_data_m5)
  rownames(processed_data_m5_df) <- rownames(df_picked_m5)
  print(dim(processed_data_m5))
}
if(METHOD_FEATURE_FLAG == 5){
  AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
  print(length(AfterProcess_FeatureName_m5))
  head(AfterProcess_FeatureName_m5)
  tail(AfterProcess_FeatureName_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  levels(df_picked_m5$DX)
}
if(METHOD_FEATURE_FLAG == 5){
  lastColumn_NUM_m5<-dim(processed_data_m5)[2]
  last5Column_NUM_m5<-lastColumn_NUM_m5-5
  head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
  print(levels(processed_data_m5$DX))
  print(dim(processed_data_m5))
}

(5) Method Six - MCI vs AD (Dementia)

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 6){
  
  df_fs_method6<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 6){
  phenotic_features_m6<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
  df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]

  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
  head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
  dim(df_picked_m6)
}
Filter and Change to Classification with ‘MCI vs Dementia’
if(METHOD_FEATURE_FLAG == 6){
  df_picked_m6<-df_picked_m6 %>%  filter(DX != "CN") %>% droplevels()

  
  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)

  head(df_picked_m6[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 6){
  print(dim(df_picked_m6))
  print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
  df_fs_method6 <- df_fs_method6 %>%  filter(DX != "CN") %>% droplevels()
  df_fs_method6$DX<-as.factor(df_fs_method6$DX)
  print(head(df_fs_method6))
  print(dim(df_fs_method6))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 6){
  pheno_data_m6 <- df_picked_m6[,phenotic_features_m6] 
  print(head(pheno_data_m6[,1:5],n=3))

  design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)

  colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
  colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"

  print(head(design_m6))

  beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 6){

  fit_m6 <- lmFit(beta_values_m6, design_m6)
  head(fit_m6$coefficients)


  contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
 
  fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  dmp_results_m6_try1 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m6_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 6){
  # Identify DMPs, we will use this one:
  dmp_results_m6 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m6)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 6){

  significant_dmp_filter <- dmp_results_m6 != 0 
  significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
  df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]

  dim(df_picked_m6)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 6){
  full_results_m6 <- topTable(fit2_m6, number=Inf)
  full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
  head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  sorted_full_results_m6 <- full_results_m6[
    order(full_results_m6$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  library(ggplot2)
  ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 6){
  ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 6){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m6) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m6)

  processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
  processed_data_m6_df <- as.data.frame(processed_data_m6)
  rownames(processed_data_m6_df) <- rownames(df_picked_m6)
  print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
  AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
  print(length(AfterProcess_FeatureName_m6))
  head(AfterProcess_FeatureName_m6)
  tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
  lastColumn_NUM_m6<-dim(processed_data_m6)[2]
  last5Column_NUM_m6<-lastColumn_NUM_m6-5
  head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
  print(levels(processed_data_m6$DX))
  print(dim(processed_data_m6))
}

1.7 INPUT - Model Train

name for “processed_data” could be :

  1. “processed_data_m1”, which uses method one to process the data

  2. “processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

  3. “processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

    Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.

  4. “processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

  5. “processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

  6. “processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

name for “AfterProcess_FeatureName” (include “DX” label) could be :

  1. “AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
  2. “AfterProcess_FeatureName_m2”, which is column name of principle component method.
  3. “AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
  4. “AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
  5. “AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
  6. “AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
if(METHOD_FEATURE_FLAG==1){
  
  processed_dataFrame<-processed_data_m1_df
  processed_data<-processed_data_m1

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m1

  
}


if(METHOD_FEATURE_FLAG==2){
  
  processed_dataFrame<-processed_data_m2_df
  processed_data<-processed_data_m2

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m2

  
}

if(METHOD_FEATURE_FLAG==3){
  
  processed_dataFrame<-processed_data_m3_df
  processed_data<-processed_data_m3

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m3

  
}

if(METHOD_FEATURE_FLAG==4){
  
  processed_dataFrame<-processed_data_m4_df
  processed_data<-processed_data_m4

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m4

  
}

if(METHOD_FEATURE_FLAG==5){
  
  processed_dataFrame<-processed_data_m5_df
  processed_data<-processed_data_m5

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m5

  
}

if(METHOD_FEATURE_FLAG==6){
  
  processed_dataFrame<-processed_data_m6_df
  processed_data<-processed_data_m6

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m6

  
}
print(head(processed_dataFrame))
##                      age.now          PC1         PC2          PC3 cg02483977 cg17348244 cg17002338 cg16020483 cg02095601 cg23916408 cg10542624 cg11834635 cg25174111 cg03640465 cg23840008 cg11331837
## 200223270003_R03C01 78.60000 -0.172761185  0.05745834  0.005055871  0.9069841 0.81793075  0.2684163  0.1673606  0.9161259  0.9154993 0.02189577  0.8880887  0.8573844  0.2531644 0.66547425 0.57150125
## 200223270003_R06C01 80.40000 -0.003667305  0.08372861  0.029143653  0.8814641 0.07241099  0.2811103  0.1209622  0.2233062  0.8886255 0.54330620  0.2493491  0.2567745  0.2904433 0.88483246 0.03182862
## 200223270003_R07C01 78.16441 -0.186779607 -0.01117250 -0.032302430  0.5338863 0.78025001  0.2706349  0.2499647  0.8978191  0.8872447 0.54991492  0.2210428  0.1903803  0.9024530 0.09020907 0.03832164
##                     cg12012426 cg06032337 cg12434901 cg05125667 cg08397053 cg00999469 cg01608425 cg27639199 cg24851651 cg22071943 cg23813394 cg13232075 cg14252149 cg16390578 cg04831745 cg10713875
## 200223270003_R03C01  0.9434768  0.5657198  0.8458468 0.54151552 0.04199567  0.2857719  0.9264388 0.67552763 0.05358297  0.2442648 0.48811365 0.02500117 0.02450779 0.20983422 0.71214149  0.8973298
## 200223270003_R06C01  0.9220044  0.5653758  0.8299579 0.49090787 0.04437741  0.2499229  0.8887753 0.06233093 0.05968923  0.2644581 0.02943436 0.02823740 0.02382413 0.06389068 0.06871768  0.1322912
## 200223270003_R07C01  0.9241284  0.5229594  0.8482994 0.01590936 0.59796746  0.2819622  0.9065432 0.05701332 0.60864179  0.2599947 0.92935625 0.02527324 0.56212480 0.23101450 0.90994644  0.8860284
##                     cg11716267 cg16268937 cg18339359 cg12284872 cg20218135 cg10844498 cg11826549 cg18662228 cg08407901 cg04577745 cg04073914 cg02302183 cg22251955 cg05321907 cg27187580 cg10985055
## 200223270003_R03C01 0.04959702  0.8931712  0.9040272  0.7414569 0.64278153  0.1391318 0.04794983  0.8730153  0.0253267  0.2681033 0.03089677  0.9191148 0.02254441  0.1782629  0.6643576  0.8631895
## 200223270003_R06C01 0.49143010  0.9034556  0.8552121  0.7725267 0.06509247  0.1385549 0.03672380  0.8602464  0.4553539  0.8570624 0.89962516  0.8749250 0.02714054  0.8427929  0.6914924  0.5456633
## 200223270003_R07C01 0.45857830  0.8928450  0.3073106  0.7573369 0.65642359  0.7374725 0.51173417  0.8683578  0.4025706  0.9002276 0.47195215  0.8888247 0.89577950  0.8320504  0.9357074  0.8825100
##                     cg11835797 cg19248407 cg04798314 cg06002867 cg27341708 cg11266396 cg12466610 cg03327352 cg09829645 cg17419220 cg20300784 cg14609402 cg16733676 cg05130642 cg04845852 cg25649515
## 200223270003_R03C01  0.9007408  0.8313131 0.07119798 0.84888752 0.02613847 0.01905761 0.59131778  0.8786878  0.5191302 0.43470227 0.86609999  0.9087631  0.8904541  0.8644077  0.9212268 0.92357530
## 200223270003_R06C01  0.8944957  0.8525281 0.09248843 0.02698175 0.86893582 0.53122014 0.06939623  0.3042310  0.0431402 0.02781411 0.03091187  0.9109735  0.1698111  0.3661324  0.5118209 0.58958387
## 200223270003_R07C01  0.8168544  0.8467857 0.06972566 0.48042117 0.02642300 0.02421064 0.04527733  0.8273211  0.5195872 0.42803809 0.90319796  0.9099145  0.9203317  0.3039272  0.9034373 0.02958575
##                     cg08779649 cg07456472 cg13885788 cg25561557  cg22901347 cg00156497  cg03088219 cg12074150 cg10058204 cg09650803 cg12240569 cg24638099 cg17906851 cg16089727 cg27114706 cg26089705
## 200223270003_R03C01 0.45076825  0.5856904  0.9369476 0.03851635 0.001690332  0.5194653 0.007435243 0.18602738  0.5834496  0.8954464 0.02690547  0.4262170  0.9529718 0.54996692  0.9359259 0.50810373
## 200223270003_R06C01 0.04810217  0.3886482  0.5163017 0.47259480 0.103413834  0.9024063 0.120155222 0.14231506  0.0549494  0.9113477 0.46030640  0.8787392  0.6462151 0.05876736  0.9285384 0.03322136
## 200223270003_R07C01 0.42715969  0.9186405  0.9183376 0.43364249 0.632991482  0.9067989 0.826554308 0.09201303  0.5689591  0.2518414 0.86185839  0.8682765  0.9553497 0.85485461  0.4787397 0.03118009
##                     cg04867412 cg02823329 cg13688351 cg05059349 cg00841008 cg10507965 cg14780448 cg10786572 cg02901522 cg15535896 cg16310958 cg24065597 cg18821122 cg20704148 cg05841700 cg23836570
## 200223270003_R03C01  0.8796800  0.6464005  0.8799586 0.04507417 0.61899333  0.4010973 0.67021018  0.5982086  0.9372901  0.9253926  0.9300073  0.2221098  0.5901603 0.02409027  0.9146488 0.54259383
## 200223270003_R06C01  0.4497115  0.9633930  0.8814820 0.03898752 0.05401588  0.4033691 0.62073547  0.0935115  0.4954978  0.3320191  0.9228871  0.7036129  0.5779620 0.02580923  0.3737990 0.03267304
## 200223270003_R07C01  0.4445373  0.6617541  0.4646991 0.85329923 0.90769205  0.3869543 0.04425741  0.8436837  0.9381188  0.9409104  0.8539019  0.2407676  0.9251431 0.47357786  0.5046468 0.59939745
##                     cg16536985 cg02495179 cg10829391 cg02494911 cg02078724 cg04242342 cg05373298 cg09247979 cg04771146 cg13799572 cg18310072 cg18037388 cg03172493 cg06864789 cg00729708 cg27224751
## 200223270003_R03C01  0.5418687  0.7373055  0.5929616  0.2416332  0.2896133  0.8167892 0.02652391  0.5706177  0.7648566  0.8449584  0.1449858  0.7545086 0.63362492  0.4605312  0.1188099 0.03214912
## 200223270003_R06C01  0.8392044  0.5588114  0.9411947  0.2520909  0.2805612  0.8040357 0.83538124  0.5090215  0.3125007  0.4409219  0.9321264  0.7294565 0.06148804  0.8751365  0.1206326 0.83123722
## 200223270003_R07C01  0.8822891  0.5273309  0.9322956  0.2457032  0.2739571  0.8286115 0.89506024  0.5066661  0.2909958  0.8516975  0.9108063  0.2391659 0.64562298  0.4902033  0.7636159 0.79732117
##                     cg16527629 cg26983017 cg24859648 cg00051154 cg00675157 cg02656016 cg07304760 cg06264882 cg22274273 cg04768387 cg23350716 cg02217425 cg11227702 cg12333628 cg05351360 cg05161773
## 200223270003_R03C01  0.4365003 0.03145466 0.44392797 0.08370609  0.9242325  0.2355680  0.5798534 0.43678655  0.4246379  0.9465814  0.7876873  0.1032503 0.49184121  0.9092861 0.03855181  0.4154907
## 200223270003_R06C01  0.0708336 0.84677625 0.03341185 0.61288950  0.9254708  0.9052318  0.5575516 0.43703442  0.4196796  0.9098563  0.6960544  0.6592850 0.02543724  0.5084647 0.76395533  0.8526849
## 200223270003_R07C01  0.4492586 0.53922255 0.43582347 0.07638127  0.5447244  0.8653682  0.9195617 0.02439581  0.4164100  0.9413240  0.7387498  0.8792021 0.45150971  0.5229394 0.77000888  0.4259275
##                     cg27286614 cg14764203 cg14181112 cg20913114 cg02932958 cg22681945 cg17811452 cg15775217 cg06624143 cg01280698 cg03057303 cg11314779 cg00421199 cg16715186 cg02489327 cg05749243
## 200223270003_R03C01  0.5933858  0.4683709  0.1615405 0.80382984  0.4210489  0.8388195 0.82740141  0.9168327  0.4899758 0.88462009  0.8923039  0.8966100  0.8532461  0.7946153  0.8616312  0.9209685
## 200223270003_R06C01  0.6348795  0.8916566  0.3424621 0.03158439  0.3825995  0.8700500 0.09338396  0.6042521  0.9107688 0.88471320  0.4954311  0.8908661  0.8891803  0.8124316  0.8777949  0.9143061
## 200223270003_R07C01  0.9468370  0.8714472  0.2178314 0.81256840  0.7617081  0.3344105 0.79817238  0.9062231  0.9217350 0.06370005  0.4695066  0.9048316  0.8937751  0.7773263  0.4205073  0.9121180
##                     cg09518270 cg05455372 cg12738248 cg04218584 cg19848641 cg01013522 cg02627240 cg05096415 cg07951602 cg02389264 cg03167407 cg24861747 cg19512141 cg10701746 cg00332268 cg06378561
## 200223270003_R03C01  0.8870663  0.5532370 0.88010292  0.8971263  0.9155493  0.8862821 0.57129408  0.5177819  0.8766842  0.7900942  0.7610292  0.4309505  0.7903543  0.4868342  0.9044887  0.9377503
## 200223270003_R06C01  0.8765622  0.6375708 0.51121855  0.8491768  0.4888000  0.5425308 0.05309659  0.6288426  0.8918089  0.7789974  0.3087606  0.8071462  0.8404684  0.4927257  0.5777209  0.5154019
## 200223270003_R07C01  0.8135001  0.8095964 0.09131476  0.9008137  0.9139292  0.8429862 0.52179136  0.6060271  0.8706938  0.4174463  0.2455453  0.3347317  0.2202759  0.8552180  0.5848006  0.9403569
##                     cg17386240 cg12471283 cg03187614 cg04467639 cg00648024 cg17623720 cg01802772 cg11706829 cg02356645 cg14465143 cg06012621 cg12556569 cg05813498 cg11173002 cg13226272 cg26007606
## 200223270003_R03C01  0.7144809  0.8658731  0.8826518  0.6400206 0.40202875  0.8988624 0.02361869  0.5444785  0.5833923  0.5543068  0.8579519 0.03924599  0.9039353  0.5913599  0.5410002  0.5615550
## 200223270003_R06C01  0.8074824  0.6963410  0.5131472  0.5657041 0.05579011  0.8172384 0.02401520  0.5669449  0.5701428  0.2702875  0.5325037 0.48636893  0.6252849  0.1878736  0.4437070  0.1463111
## 200223270003_R07C01  0.7227918  0.6680611  0.5281030  0.6302917 0.03708944  0.8226085 0.02200957  0.8746449  0.5683381  0.2621492  0.6263080 0.46498877  0.9086932  0.5150840  0.0265215  0.8101800
##                     cg03628603 cg26739327 cg08584917 cg23161429 cg07138269 cg07584620 cg26081710 cg12213037 cg13080267 cg25758034 cg14924512 cg12293347 cg09993718 cg07480176 cg21757617 cg21501207
## 200223270003_R03C01  0.9157246  0.7693268  0.9019732  0.9099619  0.9426707  0.3763980  0.9198212   0.248785 0.78371483  0.6649219  0.9160885  0.9253031  0.7227856  0.3760452  0.4429909  0.6813712
## 200223270003_R06C01  0.8851075  0.8727608  0.9187789  0.8833895  0.5057781  0.8530961  0.8801892   0.812695 0.09436069  0.2393844  0.9088414  0.9176094  0.4378752  0.6998389  0.4472538  0.4747229
## 200223270003_R07C01  0.8923890  0.8340445  0.6007449  0.9134709  0.9400527  0.3888623  0.9153264   0.506374 0.09351259  0.7071501  0.9081681  0.6028463  0.7067889  0.2189042  0.4339315  0.7422003
##                     cg16098618 cg20678988 cg06875704 cg16431720 cg01097733 cg21578644 cg16858433 cg18861767 cg12702014 cg16338321 cg12776173 cg18029737 cg12306781 cg15591384 cg19555075 cg01130884
## 200223270003_R03C01  0.2571464  0.8548886  0.9181165  0.8692449  0.6753081  0.9260863  0.9194211  0.7847380  0.7848681  0.8294062  0.8730635  0.9016634  0.8663817  0.7870275  0.4921409  0.6230659
## 200223270003_R06C01  0.6899734  0.7786685  0.9200461  0.8773137  0.9131513  0.9159726  0.9271632  0.4734572  0.8065993  0.4918708  0.7009491  0.7376586  0.8027798  0.7429614  0.4261618  0.2847748
## 200223270003_R07C01  0.6488005  0.8260541  0.9048289  0.8988328  0.6832952  0.9178001  0.9288986  0.7312175  0.7458594  0.5245645  0.1136716  0.9397667  0.8787250  0.8346279  0.4694729  0.2313285
##                     cg03084184 cg17329602 cg04124201 cg12858518 cg21533482 cg23698271 cg26948066 cg26474732 cg15700429 cg12421087 cg03359067 cg21575308 cg04109990 cg11109139 cg12279734 cg20070588
## 200223270003_R03C01  0.7877128  0.8189317  0.3308589  0.9285252  0.8288469  0.9109565  0.5026045  0.8184088  0.9114530  0.5399655  0.8628564 0.44702405  0.6476604  0.6350109  0.1494651  0.5057088
## 200223270003_R06C01  0.4546397  0.8478185  0.3241613  0.9017533  0.6766373  0.9051701  0.9101976  0.7358417  0.8838233  0.5400348  0.8144536 0.44792570  0.6692040  0.6904482  0.8760759  0.8654344
## 200223270003_R07C01  0.7812413  0.8596400  0.4332693  0.9187879  0.6235932  0.8804362  0.9379543  0.7509296  0.9095363  0.5291975  0.8737908 0.02822675  0.9024920  0.6274025  0.8674214  0.8425849
##                     cg20094343 cg12108278 cg10666341 cg23432430 cg15399577 cg19503462 cg11787167 cg17296678 cg05138546 cg08242313 cg09584650 cg26889118 cg22653957 cg12080266 cg12689021 cg21986118
## 200223270003_R03C01  0.7128750  0.9243869  0.6731062  0.9455418  0.8785443  0.4537684 0.04673831  0.5653917  0.6230487  0.8953645 0.09661586  0.9154836  0.6442184  0.9450629  0.7449475  0.6571296
## 200223270003_R06C01  0.3291595  0.9068995  0.6443180  0.9418716  0.8703169  0.6997359 0.32564508  0.5272971  0.8963047  0.8573493 0.52399749  0.9101336  0.9531308  0.9363381  0.7872237  0.7034445
## 200223270003_R07C01  0.4013815  0.9131367  0.8970292  0.9426559  0.8968856  0.7189778 0.43162543  0.7661613  0.9057159  0.8992114 0.11587211  0.5759967  0.6534542  0.6398247  0.7523141  0.9055894
##                     cg12925689 cg26052728 cg17044529 cg24422984 cg14904299 cg07971231 cg04664583 cg26757229 cg09216282 cg03982462 cg15501526 cg01680303 cg06371647 cg06536614 cg15730644 cg04033559
## 200223270003_R03C01 0.38196419  0.1513937  0.9117895  0.5462594  0.2712472  0.8406145  0.5881190  0.1422661  0.9244259  0.6023731  0.6319253  0.1344941  0.8198684  0.5746694  0.4353906  0.8768243
## 200223270003_R06C01 0.02873309  0.5254754  0.9290636  0.5193121  0.8364544  0.8447914  0.9352717  0.7933794  0.9263996  0.8778458  0.7435100  0.7573869  0.8069537  0.5773468  0.8763048  0.8257388
## 200223270003_R07C01 0.38592071  0.5600724  0.9402858  0.1970387  0.8193867  0.8874706  0.9350230  0.8074830  0.9352308  0.8860227  0.7756577  0.4772204  0.2925124  0.5848917  0.4833709  0.8900962
##                     cg02981548 cg24104387 cg17429539 cg02872767 cg11358878 cg00322003 cg14170504 cg23947654 cg18526121 cg11247378 cg03115532 cg07152869 cg26901661 cg17118775 cg03392100 cg06870118
## 200223270003_R03C01  0.5220037  0.5339034  0.7100923  0.3886537 0.83252951  0.5702070 0.02236650  0.8079296  0.4762313  0.7874849  0.8659608   0.505063  0.8754981  0.5585676  0.9227394  0.8100144
## 200223270003_R06C01  0.5098965  0.3007614  0.7660838  0.9099575 0.87521203  0.3077122 0.02988245  0.8017579  0.4833367  0.4807942  0.8533871   0.835249  0.9021064  0.2916054  0.8902340  0.7802055
## 200223270003_R07C01  0.5660985  0.7509780  0.6984969  0.8603283 0.08917903  0.6104341 0.48543531  0.7584946  0.7761450  0.4537348  0.4416574   0.519430  0.8556831  0.2868948  0.4359657  0.7917257
##                     cg27452255 cg13375589 cg06697310 cg16361249 cg11479389 cg20507276 cg00977253 cg27577781 cg04970287 cg05377703       DX
## 200223270003_R03C01  0.6593379  0.4578240  0.8653044 0.52843073  0.2217463 0.38721972  0.9145988  0.8113185  0.8875750  0.8213047       CN
## 200223270003_R06C01  0.9012217  0.6025638  0.2405168 0.09039669  0.5568440 0.47978438  0.8944518  0.8144274  0.4651667  0.5152514       CN
## 200223270003_R07C01  0.8898635  0.8182629  0.8479193 0.42039062  0.5887680 0.02261996  0.9150206  0.7970617  0.9092326  0.7773036 Dementia
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
print(dim(processed_dataFrame))
## [1] 315 283
print(length(AfterProcess_FeatureName))
## [1] 283
print(head(processed_data))
## # A tibble: 6 × 283
##   age.now      PC1     PC2      PC3 cg02483977 cg17348244 cg17002338 cg16020483 cg02095601 cg23916408 cg10542624 cg11834635 cg25174111 cg03640465 cg23840008 cg11331837 cg12012426 cg06032337 cg12434901
##     <dbl>    <dbl>   <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1    78.6 -0.173    0.0575  0.00506      0.907     0.818       0.268      0.167      0.916      0.915     0.0219      0.888      0.857      0.253     0.665      0.572       0.943     0.566       0.846
## 2    80.4 -0.00367  0.0837  0.0291       0.881     0.0724      0.281      0.121      0.223      0.889     0.543       0.249      0.257      0.290     0.885      0.0318      0.922     0.565       0.830
## 3    78.2 -0.187   -0.0112 -0.0323       0.534     0.780       0.271      0.250      0.898      0.887     0.550       0.221      0.190      0.902     0.0902     0.0383      0.924     0.523       0.848
## 4    80.7 -0.0379   0.0157 -0.00869      0.883     0.0473      0.931      0.250      0.914      0.152     0.527       0.203      0.205      0.908     0.0579     0.540       0.927     0.0240      0.809
## 5    71.5 -0.139    0.0299 -0.0317       0.897     0.823       0.909      0.894      0.899      0.898     0.515       0.218      0.866      0.266     0.0535     0.894       0.356     0.525       0.818
## 6    83.5 -0.213    0.0518  0.0186       0.535     0.0628      0.226      0.877      0.914      0.209     0.935       0.160      0.203      0.924     0.559      0.0338      0.316     0.483       0.168
## # ℹ 264 more variables: cg05125667 <dbl>, cg08397053 <dbl>, cg00999469 <dbl>, cg01608425 <dbl>, cg27639199 <dbl>, cg24851651 <dbl>, cg22071943 <dbl>, cg23813394 <dbl>, cg13232075 <dbl>,
## #   cg14252149 <dbl>, cg16390578 <dbl>, cg04831745 <dbl>, cg10713875 <dbl>, cg11716267 <dbl>, cg16268937 <dbl>, cg18339359 <dbl>, cg12284872 <dbl>, cg20218135 <dbl>, cg10844498 <dbl>,
## #   cg11826549 <dbl>, cg18662228 <dbl>, cg08407901 <dbl>, cg04577745 <dbl>, cg04073914 <dbl>, cg02302183 <dbl>, cg22251955 <dbl>, cg05321907 <dbl>, cg27187580 <dbl>, cg10985055 <dbl>,
## #   cg11835797 <dbl>, cg19248407 <dbl>, cg04798314 <dbl>, cg06002867 <dbl>, cg27341708 <dbl>, cg11266396 <dbl>, cg12466610 <dbl>, cg03327352 <dbl>, cg09829645 <dbl>, cg17419220 <dbl>,
## #   cg20300784 <dbl>, cg14609402 <dbl>, cg16733676 <dbl>, cg05130642 <dbl>, cg04845852 <dbl>, cg25649515 <dbl>, cg08779649 <dbl>, cg07456472 <dbl>, cg13885788 <dbl>, cg25561557 <dbl>,
## #   cg22901347 <dbl>, cg00156497 <dbl>, cg03088219 <dbl>, cg12074150 <dbl>, cg10058204 <dbl>, cg09650803 <dbl>, cg12240569 <dbl>, cg24638099 <dbl>, cg17906851 <dbl>, cg16089727 <dbl>,
## #   cg27114706 <dbl>, cg26089705 <dbl>, cg04867412 <dbl>, cg02823329 <dbl>, cg13688351 <dbl>, cg05059349 <dbl>, cg00841008 <dbl>, cg10507965 <dbl>, cg14780448 <dbl>, cg10786572 <dbl>, …
print(dim(processed_data))
## [1] 315 283
print(AfterProcess_FeatureName)
##   [1] "age.now"    "PC1"        "PC2"        "PC3"        "cg02483977" "cg17348244" "cg17002338" "cg16020483" "cg02095601" "cg23916408" "cg10542624" "cg11834635" "cg25174111" "cg03640465" "cg23840008"
##  [16] "cg11331837" "cg12012426" "cg06032337" "cg12434901" "cg05125667" "cg08397053" "cg00999469" "cg01608425" "cg27639199" "cg24851651" "cg22071943" "cg23813394" "cg13232075" "cg14252149" "cg16390578"
##  [31] "cg04831745" "cg10713875" "cg11716267" "cg16268937" "cg18339359" "cg12284872" "cg20218135" "cg10844498" "cg11826549" "cg18662228" "cg08407901" "cg04577745" "cg04073914" "cg02302183" "cg22251955"
##  [46] "cg05321907" "cg27187580" "cg10985055" "cg11835797" "cg19248407" "cg04798314" "cg06002867" "cg27341708" "cg11266396" "cg12466610" "cg03327352" "cg09829645" "cg17419220" "cg20300784" "cg14609402"
##  [61] "cg16733676" "cg05130642" "cg04845852" "cg25649515" "cg08779649" "cg07456472" "cg13885788" "cg25561557" "cg22901347" "cg00156497" "cg03088219" "cg12074150" "cg10058204" "cg09650803" "cg12240569"
##  [76] "cg24638099" "cg17906851" "cg16089727" "cg27114706" "cg26089705" "cg04867412" "cg02823329" "cg13688351" "cg05059349" "cg00841008" "cg10507965" "cg14780448" "cg10786572" "cg02901522" "cg15535896"
##  [91] "cg16310958" "cg24065597" "cg18821122" "cg20704148" "cg05841700" "cg23836570" "cg16536985" "cg02495179" "cg10829391" "cg02494911" "cg02078724" "cg04242342" "cg05373298" "cg09247979" "cg04771146"
## [106] "cg13799572" "cg18310072" "cg18037388" "cg03172493" "cg06864789" "cg00729708" "cg27224751" "cg16527629" "cg26983017" "cg24859648" "cg00051154" "cg00675157" "cg02656016" "cg07304760" "cg06264882"
## [121] "cg22274273" "cg04768387" "cg23350716" "cg02217425" "cg11227702" "cg12333628" "cg05351360" "cg05161773" "cg27286614" "cg14764203" "cg14181112" "cg20913114" "cg02932958" "cg22681945" "cg17811452"
## [136] "cg15775217" "cg06624143" "cg01280698" "cg03057303" "cg11314779" "cg00421199" "cg16715186" "cg02489327" "cg05749243" "cg09518270" "cg05455372" "cg12738248" "cg04218584" "cg19848641" "cg01013522"
## [151] "cg02627240" "cg05096415" "cg07951602" "cg02389264" "cg03167407" "cg24861747" "cg19512141" "cg10701746" "cg00332268" "cg06378561" "cg17386240" "cg12471283" "cg03187614" "cg04467639" "cg00648024"
## [166] "cg17623720" "cg01802772" "cg11706829" "cg02356645" "cg14465143" "cg06012621" "cg12556569" "cg05813498" "cg11173002" "cg13226272" "cg26007606" "cg03628603" "cg26739327" "cg08584917" "cg23161429"
## [181] "cg07138269" "cg07584620" "cg26081710" "cg12213037" "cg13080267" "cg25758034" "cg14924512" "cg12293347" "cg09993718" "cg07480176" "cg21757617" "cg21501207" "cg16098618" "cg20678988" "cg06875704"
## [196] "cg16431720" "cg01097733" "cg21578644" "cg16858433" "cg18861767" "cg12702014" "cg16338321" "cg12776173" "cg18029737" "cg12306781" "cg15591384" "cg19555075" "cg01130884" "cg03084184" "cg17329602"
## [211] "cg04124201" "cg12858518" "cg21533482" "cg23698271" "cg26948066" "cg26474732" "cg15700429" "cg12421087" "cg03359067" "cg21575308" "cg04109990" "cg11109139" "cg12279734" "cg20070588" "cg20094343"
## [226] "cg12108278" "cg10666341" "cg23432430" "cg15399577" "cg19503462" "cg11787167" "cg17296678" "cg05138546" "cg08242313" "cg09584650" "cg26889118" "cg22653957" "cg12080266" "cg12689021" "cg21986118"
## [241] "cg12925689" "cg26052728" "cg17044529" "cg24422984" "cg14904299" "cg07971231" "cg04664583" "cg26757229" "cg09216282" "cg03982462" "cg15501526" "cg01680303" "cg06371647" "cg06536614" "cg15730644"
## [256] "cg04033559" "cg02981548" "cg24104387" "cg17429539" "cg02872767" "cg11358878" "cg00322003" "cg14170504" "cg23947654" "cg18526121" "cg11247378" "cg03115532" "cg07152869" "cg26901661" "cg17118775"
## [271] "cg03392100" "cg06870118" "cg27452255" "cg13375589" "cg06697310" "cg16361249" "cg11479389" "cg20507276" "cg00977253" "cg27577781" "cg04970287" "cg05377703" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess) 
## [1] 282

2. Logistic Regression Model

2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123)  # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 221 283
dim(testData)
## [1]  94 283
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)

print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        4
##   Dementia  2       24
##                                           
##                Accuracy : 0.9362          
##                  95% CI : (0.8662, 0.9762)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 2.052e-08       
##                                           
##                   Kappa : 0.8442          
##                                           
##  Mcnemar's Test P-Value : 0.6831          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.8571          
##          Pos Pred Value : 0.9412          
##          Neg Pred Value : 0.9231          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7234          
##       Balanced Accuracy : 0.9134          
##                                           
##        'Positive' Class : CN              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
##  Accuracy 
## 0.9361702
print(cm_modelTrain_LRM1_Kappa)
##     Kappa 
## 0.8441989
print(model_LRM1)
## glmnet 
## 
## 221 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa      
##   0.10   0.006203963  0.8417172   0.58709860
##   0.10   0.019618654  0.8372727   0.57500831
##   0.10   0.062039630  0.8191919   0.51714848
##   0.55   0.006203963  0.7558586   0.37823975
##   0.55   0.019618654  0.7333333   0.29699790
##   0.55   0.062039630  0.6607071   0.02328114
##   1.00   0.006203963  0.6836364   0.21227480
##   1.00   0.019618654  0.6610101   0.11375353
##   1.00   0.062039630  0.6652525  -0.05413455
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.006203963.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
modelTrain_LRM1_trainAccuracy<-train_accuracy

print(modelTrain_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.7397755
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9816
## [1] "The auc value is:"
## Area under the curve: 0.9816

if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM1_AUC <- mean_auc
}
print(modelTrain_LRM1_AUC)
## Area under the curve: 0.9816
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 282)
## 
##            Overall
## PC1         100.00
## PC2          51.73
## cg02872767   35.59
## cg11787167   33.26
## cg09216282   32.17
## cg01680303   30.99
## cg12080266   29.69
## cg19503462   29.07
## cg02356645   27.95
## cg06378561   27.32
## cg07152869   27.05
## cg12108278   26.99
## cg03084184   26.00
## cg01013522   24.92
## cg26739327   24.31
## cg06864789   23.99
## cg14780448   23.79
## cg02932958   23.65
## cg12858518   23.39
## cg26757229   23.25
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)

ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##         Overall
## 1   4.624245707
## 2   2.392156119
## 3   1.645580845
## 4   1.538250775
## 5   1.487778623
## 6   1.433064454
## 7   1.372991350
## 8   1.344417772
## 9   1.292393615
## 10  1.263438190
## 11  1.250638661
## 12  1.248000174
## 13  1.202261711
## 14  1.152431634
## 15  1.124385296
## 16  1.109281052
## 17  1.100325375
## 18  1.093746865
## 19  1.081791387
## 20  1.075307142
## 21  1.063235800
## 22  1.046823516
## 23  1.039697777
## 24  1.036839751
## 25  0.999956780
## 26  0.987272096
## 27  0.986599853
## 28  0.982537303
## 29  0.971565342
## 30  0.944575861
## 31  0.915364461
## 32  0.904124375
## 33  0.899184422
## 34  0.898039482
## 35  0.892361340
## 36  0.891965900
## 37  0.891453828
## 38  0.882747331
## 39  0.880680596
## 40  0.876583598
## 41  0.860961540
## 42  0.841486018
## 43  0.830215491
## 44  0.825033319
## 45  0.817955259
## 46  0.812905344
## 47  0.812663328
## 48  0.805972859
## 49  0.804561596
## 50  0.790335561
## 51  0.782756825
## 52  0.773980807
## 53  0.759520129
## 54  0.759336168
## 55  0.750053927
## 56  0.748401073
## 57  0.743665624
## 58  0.733625974
## 59  0.729696208
## 60  0.729499864
## 61  0.724689609
## 62  0.721807876
## 63  0.719669695
## 64  0.718815674
## 65  0.709234290
## 66  0.706227637
## 67  0.703319159
## 68  0.702357814
## 69  0.697580454
## 70  0.675400967
## 71  0.673631751
## 72  0.673170256
## 73  0.668179409
## 74  0.666684556
## 75  0.665167944
## 76  0.660120537
## 77  0.657679482
## 78  0.653435303
## 79  0.647233807
## 80  0.641830635
## 81  0.636647918
## 82  0.634619327
## 83  0.633994976
## 84  0.631849637
## 85  0.626422434
## 86  0.623597149
## 87  0.616255116
## 88  0.588449800
## 89  0.586544412
## 90  0.564235590
## 91  0.563750299
## 92  0.560606193
## 93  0.551671979
## 94  0.544788970
## 95  0.544600820
## 96  0.542612076
## 97  0.541302663
## 98  0.531224539
## 99  0.529309716
## 100 0.519161762
## 101 0.514088680
## 102 0.512149808
## 103 0.510197028
## 104 0.503265695
## 105 0.498952978
## 106 0.495480147
## 107 0.490462742
## 108 0.486415721
## 109 0.486147155
## 110 0.485633525
## 111 0.475460950
## 112 0.472648794
## 113 0.459074195
## 114 0.452346125
## 115 0.445628450
## 116 0.443366656
## 117 0.443047109
## 118 0.441169601
## 119 0.439340461
## 120 0.438360947
## 121 0.436117393
## 122 0.435754029
## 123 0.433450217
## 124 0.428787118
## 125 0.425523120
## 126 0.420386140
## 127 0.418570455
## 128 0.418119445
## 129 0.407112381
## 130 0.405950654
## 131 0.403744117
## 132 0.395936675
## 133 0.395241694
## 134 0.386753740
## 135 0.385816845
## 136 0.380850339
## 137 0.377819618
## 138 0.370634938
## 139 0.367377663
## 140 0.364175084
## 141 0.363089667
## 142 0.362854052
## 143 0.359301333
## 144 0.358914011
## 145 0.358539646
## 146 0.355240731
## 147 0.354496539
## 148 0.352190144
## 149 0.349989176
## 150 0.340078723
## 151 0.334043714
## 152 0.333856625
## 153 0.331466392
## 154 0.331168510
## 155 0.327254741
## 156 0.323391439
## 157 0.318079769
## 158 0.317952069
## 159 0.314120492
## 160 0.312686044
## 161 0.312489594
## 162 0.309532083
## 163 0.305724366
## 164 0.300900702
## 165 0.299481583
## 166 0.293126957
## 167 0.288680647
## 168 0.279521665
## 169 0.278145483
## 170 0.267575440
## 171 0.265352101
## 172 0.261791341
## 173 0.260836072
## 174 0.256620174
## 175 0.249486860
## 176 0.247961773
## 177 0.247354782
## 178 0.243168742
## 179 0.235373911
## 180 0.230376278
## 181 0.223563635
## 182 0.223464321
## 183 0.220607010
## 184 0.217831514
## 185 0.212570505
## 186 0.211195918
## 187 0.209845122
## 188 0.204032826
## 189 0.203921119
## 190 0.201878820
## 191 0.194556000
## 192 0.192577687
## 193 0.191126059
## 194 0.189374449
## 195 0.185380508
## 196 0.180973397
## 197 0.177178547
## 198 0.176444983
## 199 0.169223360
## 200 0.164018286
## 201 0.160627016
## 202 0.160339893
## 203 0.150193844
## 204 0.147120168
## 205 0.143172315
## 206 0.140973922
## 207 0.140648911
## 208 0.138406368
## 209 0.134770846
## 210 0.134575866
## 211 0.134197709
## 212 0.133728778
## 213 0.125810161
## 214 0.116655994
## 215 0.104937959
## 216 0.104615352
## 217 0.095114115
## 218 0.095023579
## 219 0.090134603
## 220 0.083191991
## 221 0.080041734
## 222 0.079361138
## 223 0.076180523
## 224 0.058236167
## 225 0.057002450
## 226 0.055633835
## 227 0.051296454
## 228 0.050388872
## 229 0.040693182
## 230 0.040211859
## 231 0.039609242
## 232 0.039016254
## 233 0.025537584
## 234 0.023195819
## 235 0.018011648
## 236 0.015225411
## 237 0.012178814
## 238 0.011663746
## 239 0.008631612
## 240 0.002340257
## 241 0.002229245
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
## 251 0.000000000
## 252 0.000000000
## 253 0.000000000
## 254 0.000000000
## 255 0.000000000
## 256 0.000000000
## 257 0.000000000
## 258 0.000000000
## 259 0.000000000
## 260 0.000000000
## 261 0.000000000
## 262 0.000000000
## 263 0.000000000
## 264 0.000000000
## 265 0.000000000
## 266 0.000000000
## 267 0.000000000
## 268 0.000000000
## 269 0.000000000
## 270 0.000000000
## 271 0.000000000
## 272 0.000000000
## 273 0.000000000
## 274 0.000000000
## 275 0.000000000
## 276 0.000000000
## 277 0.000000000
## 278 0.000000000
## 279 0.000000000
## 280 0.000000000
## 281 0.000000000
## 282 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

2.2 Model Diagnose & Improve

2.2.1 Class imbalance

Class imbalance Check

  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia 
##      221       94
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia 
## 0.7015873 0.2984127
table(trainData$DX)
## 
##       CN Dementia 
##      155       66
prop.table(table(trainData$DX))
## 
##        CN  Dementia 
## 0.7013575 0.2986425
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 2.351064
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 2.348485
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 51.203, df = 1, p-value = 8.328e-13
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 35.842, df = 1, p-value = 2.14e-09

Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)

library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)


balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia 
##      155      132
dim(balanced_data_LGR_1)
## [1] 287 283

Fit Model with Balanced Data

ctrl <- trainControl(method = "cv", number = 5)


model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)


predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        5
##   Dementia  2       23
##                                           
##                Accuracy : 0.9255          
##                  95% CI : (0.8526, 0.9695)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 1.131e-07       
##                                           
##                   Kappa : 0.8163          
##                                           
##  Mcnemar's Test P-Value : 0.4497          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.8214          
##          Pos Pred Value : 0.9275          
##          Neg Pred Value : 0.9200          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7340          
##       Balanced Accuracy : 0.8956          
##                                           
##        'Positive' Class : CN              
## 
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
##  Accuracy 
## 0.9255319
print(cm_modelTrain_LRM2_Kappa)
##     Kappa 
## 0.8163037
print(model_LRM2)
## glmnet 
## 
## 287 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 230, 229, 230, 229, 230 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0002581477  0.9441621  0.8883815
##   0.10   0.0025814774  0.9441621  0.8883815
##   0.10   0.0258147743  0.9441621  0.8883815
##   0.55   0.0002581477  0.8675136  0.7368485
##   0.55   0.0025814774  0.8675136  0.7368485
##   0.55   0.0258147743  0.8502117  0.7025165
##   1.00   0.0002581477  0.8466425  0.6957712
##   1.00   0.0025814774  0.8535995  0.7094889
##   1.00   0.0258147743  0.8292196  0.6592661
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.02581477.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)

modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8830208
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.8830208
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 282)
## 
##            Overall
## PC1         100.00
## PC2          34.80
## cg11787167   33.42
## cg02872767   33.17
## cg09216282   31.58
## cg19503462   29.73
## cg01680303   29.47
## cg07152869   28.10
## cg12080266   28.03
## cg02356645   27.70
## cg06378561   25.86
## cg03084184   25.79
## cg12108278   25.65
## cg03982462   25.30
## cg26739327   24.68
## cg06864789   24.65
## cg04124201   24.63
## cg12858518   24.32
## cg01013522   24.18
## cg27286614   23.48
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)  
  
}
##         Overall
## 1   3.768270244
## 2   1.311239387
## 3   1.259240451
## 4   1.250079235
## 5   1.189931374
## 6   1.120226655
## 7   1.110430727
## 8   1.058856229
## 9   1.056425637
## 10  1.043718047
## 11  0.974563252
## 12  0.971653997
## 13  0.966667318
## 14  0.953274882
## 15  0.930016076
## 16  0.928926296
## 17  0.928165888
## 18  0.916266554
## 19  0.911117892
## 20  0.884928743
## 21  0.865522936
## 22  0.863982460
## 23  0.853244732
## 24  0.851382493
## 25  0.839599742
## 26  0.834236801
## 27  0.830010549
## 28  0.782299220
## 29  0.770825921
## 30  0.742027014
## 31  0.736337194
## 32  0.732887620
## 33  0.717567980
## 34  0.712934582
## 35  0.708494833
## 36  0.700599344
## 37  0.697473557
## 38  0.678294126
## 39  0.676038750
## 40  0.671766621
## 41  0.668052156
## 42  0.667516266
## 43  0.666759189
## 44  0.666278996
## 45  0.662027846
## 46  0.661390795
## 47  0.655288739
## 48  0.641038433
## 49  0.639660468
## 50  0.623047519
## 51  0.619271733
## 52  0.610359862
## 53  0.607110141
## 54  0.592971808
## 55  0.591518828
## 56  0.588736598
## 57  0.574503926
## 58  0.572820304
## 59  0.567201955
## 60  0.564948581
## 61  0.555094362
## 62  0.548297522
## 63  0.541955889
## 64  0.539849414
## 65  0.539447557
## 66  0.537363171
## 67  0.535892230
## 68  0.534979240
## 69  0.533320169
## 70  0.529930496
## 71  0.524655278
## 72  0.516393765
## 73  0.511945336
## 74  0.508204272
## 75  0.505126491
## 76  0.504453782
## 77  0.492163178
## 78  0.489698613
## 79  0.482981871
## 80  0.476102076
## 81  0.473534838
## 82  0.473251583
## 83  0.471653925
## 84  0.470060840
## 85  0.462038666
## 86  0.456348189
## 87  0.454039173
## 88  0.451006130
## 89  0.441529480
## 90  0.438947478
## 91  0.438290687
## 92  0.437437832
## 93  0.433236332
## 94  0.431436140
## 95  0.426894323
## 96  0.426205528
## 97  0.419780666
## 98  0.408429114
## 99  0.407500989
## 100 0.406145056
## 101 0.400734818
## 102 0.392814276
## 103 0.392655990
## 104 0.391391946
## 105 0.387651964
## 106 0.380553328
## 107 0.379055978
## 108 0.375703385
## 109 0.375659442
## 110 0.371867955
## 111 0.369704214
## 112 0.369367616
## 113 0.369195132
## 114 0.366999167
## 115 0.364374279
## 116 0.361937639
## 117 0.361425186
## 118 0.360207259
## 119 0.357663977
## 120 0.356806716
## 121 0.352939986
## 122 0.351843489
## 123 0.351589589
## 124 0.347987247
## 125 0.346059836
## 126 0.320515958
## 127 0.319359398
## 128 0.317516444
## 129 0.303459915
## 130 0.303365814
## 131 0.297919188
## 132 0.297465253
## 133 0.296099242
## 134 0.292626421
## 135 0.291696948
## 136 0.289361673
## 137 0.285607969
## 138 0.281630280
## 139 0.280102567
## 140 0.279278621
## 141 0.277598007
## 142 0.275589734
## 143 0.273847126
## 144 0.272510856
## 145 0.271277037
## 146 0.265936655
## 147 0.264485900
## 148 0.259401031
## 149 0.257774300
## 150 0.256352383
## 151 0.255976582
## 152 0.255764102
## 153 0.254587394
## 154 0.253165194
## 155 0.252829741
## 156 0.249788335
## 157 0.249480739
## 158 0.248560205
## 159 0.246891941
## 160 0.239434791
## 161 0.237249985
## 162 0.235503913
## 163 0.232961135
## 164 0.230096917
## 165 0.227320445
## 166 0.214980306
## 167 0.212439884
## 168 0.210212901
## 169 0.207614694
## 170 0.205578159
## 171 0.203344307
## 172 0.202498659
## 173 0.200960623
## 174 0.195924580
## 175 0.193800778
## 176 0.190882886
## 177 0.188771451
## 178 0.186545960
## 179 0.186137761
## 180 0.182182794
## 181 0.171166791
## 182 0.169680298
## 183 0.167354171
## 184 0.162074526
## 185 0.158088313
## 186 0.149568808
## 187 0.146042575
## 188 0.141623572
## 189 0.138133227
## 190 0.133284716
## 191 0.124257395
## 192 0.122686610
## 193 0.121548245
## 194 0.119560212
## 195 0.119467493
## 196 0.114511339
## 197 0.108845560
## 198 0.103608149
## 199 0.103554786
## 200 0.102390841
## 201 0.100098486
## 202 0.092144400
## 203 0.092118276
## 204 0.091327240
## 205 0.087588415
## 206 0.087012919
## 207 0.083398586
## 208 0.080323270
## 209 0.077025073
## 210 0.073182209
## 211 0.068292321
## 212 0.067576613
## 213 0.066655228
## 214 0.066412448
## 215 0.064402440
## 216 0.063446242
## 217 0.058056744
## 218 0.057608147
## 219 0.056339778
## 220 0.055596669
## 221 0.054873534
## 222 0.051810564
## 223 0.050641114
## 224 0.050539392
## 225 0.042579835
## 226 0.040612554
## 227 0.040111342
## 228 0.039616810
## 229 0.029151124
## 230 0.029118324
## 231 0.026375114
## 232 0.020876025
## 233 0.017222613
## 234 0.010144774
## 235 0.004622045
## 236 0.002378908
## 237 0.000000000
## 238 0.000000000
## 239 0.000000000
## 240 0.000000000
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
## 251 0.000000000
## 252 0.000000000
## 253 0.000000000
## 254 0.000000000
## 255 0.000000000
## 256 0.000000000
## 257 0.000000000
## 258 0.000000000
## 259 0.000000000
## 260 0.000000000
## 261 0.000000000
## 262 0.000000000
## 263 0.000000000
## 264 0.000000000
## 265 0.000000000
## 266 0.000000000
## 267 0.000000000
## 268 0.000000000
## 269 0.000000000
## 270 0.000000000
## 271 0.000000000
## 272 0.000000000
## 273 0.000000000
## 274 0.000000000
## 275 0.000000000
## 276 0.000000000
## 277 0.000000000
## 278 0.000000000
## 279 0.000000000
## 280 0.000000000
## 281 0.000000000
## 282 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9773
## [1] "The auc value is:"
## Area under the curve: 0.9773

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM2_AUC <-mean_auc
}
print(modelTrain_LRM2_AUC)
## Area under the curve: 0.9773

3. Elastic Net

3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)

set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 221 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa      
##   0      0.00100000  0.7918182   0.37423886
##   0      0.05357895  0.7918182   0.37423886
##   0      0.10615789  0.7918182   0.37423886
##   0      0.15873684  0.7918182   0.37423886
##   0      0.21131579  0.7918182   0.37423886
##   0      0.26389474  0.7918182   0.37423886
##   0      0.31647368  0.7918182   0.37423886
##   0      0.36905263  0.7918182   0.37423886
##   0      0.42163158  0.7918182   0.37423886
##   0      0.47421053  0.7918182   0.37423886
##   0      0.52678947  0.7918182   0.37423886
##   0      0.57936842  0.7918182   0.37423886
##   0      0.63194737  0.7918182   0.37423886
##   0      0.68452632  0.7918182   0.37423886
##   0      0.73710526  0.7918182   0.37423886
##   0      0.78968421  0.7918182   0.37423886
##   0      0.84226316  0.7918182   0.37423886
##   0      0.89484211  0.7918182   0.37423886
##   0      0.94742105  0.7918182   0.37423886
##   0      1.00000000  0.7918182   0.37423886
##   1      0.00100000  0.6969697   0.25680059
##   1      0.05357895  0.6607071  -0.06294811
##   1      0.10615789  0.7014141   0.00000000
##   1      0.15873684  0.7014141   0.00000000
##   1      0.21131579  0.7014141   0.00000000
##   1      0.26389474  0.7014141   0.00000000
##   1      0.31647368  0.7014141   0.00000000
##   1      0.36905263  0.7014141   0.00000000
##   1      0.42163158  0.7014141   0.00000000
##   1      0.47421053  0.7014141   0.00000000
##   1      0.52678947  0.7014141   0.00000000
##   1      0.57936842  0.7014141   0.00000000
##   1      0.63194737  0.7014141   0.00000000
##   1      0.68452632  0.7014141   0.00000000
##   1      0.73710526  0.7014141   0.00000000
##   1      0.78968421  0.7014141   0.00000000
##   1      0.84226316  0.7014141   0.00000000
##   1      0.89484211  0.7014141   0.00000000
##   1      0.94742105  0.7014141   0.00000000
##   1      1.00000000  0.7014141   0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 1.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7454874
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.7454874
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.932126696832579"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.9321267
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       12
##   Dementia  0       16
##                                           
##                Accuracy : 0.8723          
##                  95% CI : (0.7876, 0.9323)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 8.811e-05       
##                                           
##                   Kappa : 0.6519          
##                                           
##  Mcnemar's Test P-Value : 0.001496        
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5714          
##          Pos Pred Value : 0.8462          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.8298          
##       Balanced Accuracy : 0.7857          
##                                           
##        'Positive' Class : CN              
## 
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
##  Accuracy 
## 0.8723404
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
##     Kappa 
## 0.6518519
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 282)
## 
##            Overall
## PC1         100.00
## PC3          69.09
## PC2          55.08
## cg07152869   42.81
## cg19503462   39.79
## cg09216282   38.91
## cg02872767   36.97
## cg11787167   36.38
## cg26757229   35.53
## cg01013522   35.43
## cg04109990   35.05
## cg26739327   34.75
## cg04124201   34.33
## cg12858518   33.72
## cg06864789   33.51
## cg03982462   33.41
## cg01680303   33.17
## cg02356645   32.96
## cg15775217   32.92
## cg06870118   32.12
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   0.580594834
## 2   0.401470708
## 3   0.320263656
## 4   0.249173431
## 5   0.231624831
## 6   0.226538682
## 7   0.215321811
## 8   0.211864815
## 9   0.206971928
## 10  0.206368205
## 11  0.204167924
## 12  0.202448496
## 13  0.200028102
## 14  0.196465812
## 15  0.195258430
## 16  0.194699618
## 17  0.193281797
## 18  0.192060890
## 19  0.191839717
## 20  0.187205203
## 21  0.185686925
## 22  0.183199291
## 23  0.182889455
## 24  0.181820283
## 25  0.180764950
## 26  0.180685477
## 27  0.179270856
## 28  0.176635499
## 29  0.174903277
## 30  0.173458755
## 31  0.172395310
## 32  0.172302067
## 33  0.171552029
## 34  0.164923431
## 35  0.164722939
## 36  0.164349681
## 37  0.162302001
## 38  0.161243376
## 39  0.160418826
## 40  0.158039177
## 41  0.157920576
## 42  0.157777789
## 43  0.157100636
## 44  0.156717390
## 45  0.155213938
## 46  0.154539626
## 47  0.154383151
## 48  0.154224238
## 49  0.152494374
## 50  0.150114589
## 51  0.149100974
## 52  0.147865865
## 53  0.147518820
## 54  0.146540732
## 55  0.146297013
## 56  0.145665838
## 57  0.145603398
## 58  0.145103619
## 59  0.144166313
## 60  0.142413860
## 61  0.141745848
## 62  0.141299244
## 63  0.140344600
## 64  0.137946639
## 65  0.137125819
## 66  0.137125253
## 67  0.133904974
## 68  0.133278239
## 69  0.132514921
## 70  0.131492140
## 71  0.130981390
## 72  0.130238519
## 73  0.129989964
## 74  0.129668211
## 75  0.129103082
## 76  0.128892046
## 77  0.128609689
## 78  0.128086683
## 79  0.127836481
## 80  0.125989822
## 81  0.125609969
## 82  0.125469715
## 83  0.123226087
## 84  0.122717093
## 85  0.122563591
## 86  0.122473977
## 87  0.122279507
## 88  0.122035632
## 89  0.121147979
## 90  0.120962288
## 91  0.120926377
## 92  0.119272240
## 93  0.119207156
## 94  0.118818131
## 95  0.118492181
## 96  0.117402392
## 97  0.115918501
## 98  0.114739979
## 99  0.114558347
## 100 0.114181154
## 101 0.114176631
## 102 0.113769354
## 103 0.113397883
## 104 0.112397464
## 105 0.111740952
## 106 0.111260189
## 107 0.111225266
## 108 0.110713061
## 109 0.110308413
## 110 0.109290021
## 111 0.109165756
## 112 0.109058036
## 113 0.108061995
## 114 0.107269755
## 115 0.106724913
## 116 0.106426656
## 117 0.106361914
## 118 0.105855485
## 119 0.105375137
## 120 0.105241589
## 121 0.104939048
## 122 0.104341185
## 123 0.104237071
## 124 0.102490299
## 125 0.102161813
## 126 0.101872388
## 127 0.101714413
## 128 0.101410881
## 129 0.100791024
## 130 0.100647488
## 131 0.100097978
## 132 0.099876356
## 133 0.099861005
## 134 0.099685694
## 135 0.099406472
## 136 0.098621592
## 137 0.098322646
## 138 0.097702387
## 139 0.097469281
## 140 0.096451090
## 141 0.096244425
## 142 0.095798670
## 143 0.095442621
## 144 0.095336019
## 145 0.095053205
## 146 0.095038586
## 147 0.094571700
## 148 0.094487117
## 149 0.094280025
## 150 0.093873410
## 151 0.092877656
## 152 0.092113612
## 153 0.091752827
## 154 0.091708871
## 155 0.091680407
## 156 0.091446959
## 157 0.091151436
## 158 0.090880192
## 159 0.090792060
## 160 0.089511369
## 161 0.089112131
## 162 0.087990081
## 163 0.087697151
## 164 0.087624642
## 165 0.086971026
## 166 0.086936248
## 167 0.086792729
## 168 0.086733289
## 169 0.086568219
## 170 0.086189410
## 171 0.085899971
## 172 0.085801447
## 173 0.085087130
## 174 0.085033501
## 175 0.084671902
## 176 0.084480418
## 177 0.084053344
## 178 0.083964505
## 179 0.083721747
## 180 0.083632319
## 181 0.083417642
## 182 0.083285805
## 183 0.083275074
## 184 0.082877753
## 185 0.082877130
## 186 0.081485387
## 187 0.081267978
## 188 0.080310730
## 189 0.080206928
## 190 0.079760246
## 191 0.079482164
## 192 0.079296533
## 193 0.079211777
## 194 0.078921284
## 195 0.077054550
## 196 0.076234636
## 197 0.075651907
## 198 0.075466615
## 199 0.074684474
## 200 0.074003967
## 201 0.073925868
## 202 0.073291353
## 203 0.073211267
## 204 0.073027026
## 205 0.071955353
## 206 0.071725579
## 207 0.071353543
## 208 0.070634072
## 209 0.068997205
## 210 0.068577574
## 211 0.068341989
## 212 0.067997402
## 213 0.067786859
## 214 0.067668947
## 215 0.066492858
## 216 0.065930354
## 217 0.065649648
## 218 0.065537430
## 219 0.065023030
## 220 0.063957515
## 221 0.063926231
## 222 0.063621925
## 223 0.062756615
## 224 0.062204320
## 225 0.061180818
## 226 0.060940336
## 227 0.060290636
## 228 0.060092412
## 229 0.059784061
## 230 0.058712984
## 231 0.057903780
## 232 0.057520279
## 233 0.057291466
## 234 0.057207589
## 235 0.057152318
## 236 0.057032729
## 237 0.056855320
## 238 0.056680151
## 239 0.056114749
## 240 0.055642360
## 241 0.055082089
## 242 0.054079035
## 243 0.052695463
## 244 0.050997308
## 245 0.050899062
## 246 0.050433947
## 247 0.049714623
## 248 0.048491693
## 249 0.048333557
## 250 0.048081712
## 251 0.047343534
## 252 0.046341862
## 253 0.046245839
## 254 0.046104146
## 255 0.045277701
## 256 0.044130281
## 257 0.043820148
## 258 0.042971373
## 259 0.042781029
## 260 0.042342047
## 261 0.039736425
## 262 0.039393421
## 263 0.038083344
## 264 0.036790135
## 265 0.036301857
## 266 0.035026170
## 267 0.028744248
## 268 0.026027913
## 269 0.024364593
## 270 0.022957950
## 271 0.019982165
## 272 0.018625272
## 273 0.017787384
## 274 0.015995216
## 275 0.015877321
## 276 0.011148696
## 277 0.009027607
## 278 0.006795000
## 279 0.005784496
## 280 0.004500175
## 281 0.002667835
## 282 0.001042609
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.9946

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_ENM1_AUC <-mean_auc
}
print(modelTrain_ENM1_AUC)
## Area under the curve: 0.9946

4. XGBoost

4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 221 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa     
##   0.3  1          0.6               0.50        50      0.6967677  0.15749836
##   0.3  1          0.6               0.50       100      0.7419192  0.30300009
##   0.3  1          0.6               0.50       150      0.7554545  0.34349334
##   0.3  1          0.6               0.75        50      0.6879798  0.11174252
##   0.3  1          0.6               0.75       100      0.7105051  0.21053665
##   0.3  1          0.6               0.75       150      0.7376768  0.27073386
##   0.3  1          0.6               1.00        50      0.6832323  0.11286296
##   0.3  1          0.6               1.00       100      0.7106061  0.19604033
##   0.3  1          0.6               1.00       150      0.7194949  0.21568975
##   0.3  1          0.8               0.50        50      0.6969697  0.16263666
##   0.3  1          0.8               0.50       100      0.7284848  0.25984423
##   0.3  1          0.8               0.50       150      0.7557576  0.32387768
##   0.3  1          0.8               0.75        50      0.6742424  0.10495292
##   0.3  1          0.8               0.75       100      0.7106061  0.22023187
##   0.3  1          0.8               0.75       150      0.7288889  0.25802299
##   0.3  1          0.8               1.00        50      0.6926263  0.13052813
##   0.3  1          0.8               1.00       100      0.6969697  0.15565128
##   0.3  1          0.8               1.00       150      0.7149495  0.21234014
##   0.3  2          0.6               0.50        50      0.7103030  0.18100792
##   0.3  2          0.6               0.50       100      0.7375758  0.27653629
##   0.3  2          0.6               0.50       150      0.7375758  0.29418880
##   0.3  2          0.6               0.75        50      0.6744444  0.08498686
##   0.3  2          0.6               0.75       100      0.7107071  0.19440261
##   0.3  2          0.6               0.75       150      0.7152525  0.20975514
##   0.3  2          0.6               1.00        50      0.7194949  0.23326164
##   0.3  2          0.6               1.00       100      0.7013131  0.15366947
##   0.3  2          0.6               1.00       150      0.6968687  0.16358527
##   0.3  2          0.8               0.50        50      0.7012121  0.16063494
##   0.3  2          0.8               0.50       100      0.7374747  0.25220127
##   0.3  2          0.8               0.50       150      0.7329293  0.24352759
##   0.3  2          0.8               0.75        50      0.6923232  0.16369463
##   0.3  2          0.8               0.75       100      0.7014141  0.16991185
##   0.3  2          0.8               0.75       150      0.7106061  0.17672008
##   0.3  2          0.8               1.00        50      0.7240404  0.22041850
##   0.3  2          0.8               1.00       100      0.7149495  0.19669523
##   0.3  2          0.8               1.00       150      0.7239394  0.22821254
##   0.3  3          0.6               0.50        50      0.7327273  0.29084803
##   0.3  3          0.6               0.50       100      0.7463636  0.31612263
##   0.3  3          0.6               0.50       150      0.7418182  0.30543039
##   0.3  3          0.6               0.75        50      0.7014141  0.10540885
##   0.3  3          0.6               0.75       100      0.7104040  0.14340547
##   0.3  3          0.6               0.75       150      0.7014141  0.11673393
##   0.3  3          0.6               1.00        50      0.7330303  0.22048632
##   0.3  3          0.6               1.00       100      0.7375758  0.24882545
##   0.3  3          0.6               1.00       150      0.7376768  0.24823514
##   0.3  3          0.8               0.50        50      0.7421212  0.27163219
##   0.3  3          0.8               0.50       100      0.7467677  0.27857750
##   0.3  3          0.8               0.50       150      0.7603030  0.32166134
##   0.3  3          0.8               0.75        50      0.6652525  0.06195729
##   0.3  3          0.8               0.75       100      0.6742424  0.08852781
##   0.3  3          0.8               0.75       150      0.6923232  0.15433423
##   0.3  3          0.8               1.00        50      0.7015152  0.13673411
##   0.3  3          0.8               1.00       100      0.7196970  0.17667182
##   0.3  3          0.8               1.00       150      0.7151515  0.16152601
##   0.4  1          0.6               0.50        50      0.7016162  0.22540176
##   0.4  1          0.6               0.50       100      0.7197980  0.27135310
##   0.4  1          0.6               0.50       150      0.7376768  0.30547481
##   0.4  1          0.6               0.75        50      0.6789899  0.14278517
##   0.4  1          0.6               0.75       100      0.7152525  0.22665245
##   0.4  1          0.6               0.75       150      0.7286869  0.26444796
##   0.4  1          0.6               1.00        50      0.7015152  0.15535815
##   0.4  1          0.6               1.00       100      0.6968687  0.15420807
##   0.4  1          0.6               1.00       150      0.7330303  0.25680205
##   0.4  1          0.8               0.50        50      0.6652525  0.14474729
##   0.4  1          0.8               0.50       100      0.7060606  0.25144331
##   0.4  1          0.8               0.50       150      0.7196970  0.29700748
##   0.4  1          0.8               0.75        50      0.7197980  0.23208701
##   0.4  1          0.8               0.75       100      0.7422222  0.28376242
##   0.4  1          0.8               0.75       150      0.7693939  0.36199587
##   0.4  1          0.8               1.00        50      0.6744444  0.09133777
##   0.4  1          0.8               1.00       100      0.7149495  0.21421466
##   0.4  1          0.8               1.00       150      0.7376768  0.29382544
##   0.4  2          0.6               0.50        50      0.6879798  0.18512182
##   0.4  2          0.6               0.50       100      0.7015152  0.22571431
##   0.4  2          0.6               0.50       150      0.7014141  0.22552890
##   0.4  2          0.6               0.75        50      0.7015152  0.16802119
##   0.4  2          0.6               0.75       100      0.7059596  0.16559975
##   0.4  2          0.6               0.75       150      0.7105051  0.18321310
##   0.4  2          0.6               1.00        50      0.7014141  0.17186852
##   0.4  2          0.6               1.00       100      0.7149495  0.20086631
##   0.4  2          0.6               1.00       150      0.7149495  0.20086631
##   0.4  2          0.8               0.50        50      0.7057576  0.19875962
##   0.4  2          0.8               0.50       100      0.7239394  0.24886549
##   0.4  2          0.8               0.50       150      0.7239394  0.25416087
##   0.4  2          0.8               0.75        50      0.7194949  0.21387642
##   0.4  2          0.8               0.75       100      0.7331313  0.26258251
##   0.4  2          0.8               0.75       150      0.7240404  0.23931573
##   0.4  2          0.8               1.00        50      0.6971717  0.13144722
##   0.4  2          0.8               1.00       100      0.6834343  0.08375640
##   0.4  2          0.8               1.00       150      0.6834343  0.08375640
##   0.4  3          0.6               0.50        50      0.7376768  0.28826752
##   0.4  3          0.6               0.50       100      0.7285859  0.26607783
##   0.4  3          0.6               0.50       150      0.7285859  0.27886967
##   0.4  3          0.6               0.75        50      0.7242424  0.22397289
##   0.4  3          0.6               0.75       100      0.7240404  0.22332663
##   0.4  3          0.6               0.75       150      0.7241414  0.22425427
##   0.4  3          0.6               1.00        50      0.7242424  0.20555618
##   0.4  3          0.6               1.00       100      0.7106061  0.16146745
##   0.4  3          0.6               1.00       150      0.7106061  0.16146745
##   0.4  3          0.8               0.50        50      0.7147475  0.19958394
##   0.4  3          0.8               0.50       100      0.7329293  0.24215779
##   0.4  3          0.8               0.50       150      0.7329293  0.24195966
##   0.4  3          0.8               0.75        50      0.7466667  0.26679377
##   0.4  3          0.8               0.75       100      0.7284848  0.22530949
##   0.4  3          0.8               0.75       150      0.7284848  0.22530949
##   0.4  3          0.8               1.00        50      0.6557576  0.05225132
##   0.4  3          0.8               1.00       100      0.6783838  0.10100346
##   0.4  3          0.8               1.00       150      0.6783838  0.10100346
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 1, eta = 0.4, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 0.75.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.7143734
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.7143734
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)

modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy:  1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       61       18
##   Dementia  5       10
##                                           
##                Accuracy : 0.7553          
##                  95% CI : (0.6558, 0.8381)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.15488         
##                                           
##                   Kappa : 0.3248          
##                                           
##  Mcnemar's Test P-Value : 0.01234         
##                                           
##             Sensitivity : 0.9242          
##             Specificity : 0.3571          
##          Pos Pred Value : 0.7722          
##          Neg Pred Value : 0.6667          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6489          
##    Detection Prevalence : 0.8404          
##       Balanced Accuracy : 0.6407          
##                                           
##        'Positive' Class : CN              
## 
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
##  Accuracy 
## 0.7553191
print(cm_modelTrain_xgb_Kappa)
##    Kappa 
## 0.324797
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 282)
## 
##            Overall
## cg23836570  100.00
## cg06864789   94.70
## cg01013522   93.95
## cg24861747   91.64
## cg23698271   89.89
## cg00999469   89.72
## cg16390578   85.88
## cg13885788   85.74
## cg27114706   84.92
## cg25561557   82.71
## cg18037388   81.70
## cg25174111   71.48
## cg15775217   71.02
## cg26739327   68.79
## cg24859648   62.71
## cg03172493   61.34
## cg04242342   61.31
## cg20507276   59.25
## cg02356645   58.79
## cg06697310   56.07
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##        Feature         Gain       Cover   Frequency   Importance
##         <char>        <num>       <num>       <num>        <num>
##  1: cg23836570 0.0327745421 0.017027269 0.006666667 0.0327745421
##  2: cg06864789 0.0310376535 0.024182008 0.020000000 0.0310376535
##  3: cg01013522 0.0307921072 0.017343895 0.013333333 0.0307921072
##  4: cg24861747 0.0300338441 0.019787540 0.013333333 0.0300338441
##  5: cg23698271 0.0294608305 0.021641209 0.013333333 0.0294608305
##  6: cg00999469 0.0294069073 0.023462986 0.013333333 0.0294069073
##  7: cg16390578 0.0281475851 0.027280790 0.026666667 0.0281475851
##  8: cg13885788 0.0281008399 0.020536483 0.006666667 0.0281008399
##  9: cg27114706 0.0278309521 0.016329087 0.006666667 0.0278309521
## 10: cg25561557 0.0271081428 0.016718011 0.006666667 0.0271081428
## 11: cg18037388 0.0267757575 0.020583184 0.013333333 0.0267757575
## 12: cg25174111 0.0234267985 0.016381306 0.006666667 0.0234267985
## 13: cg15775217 0.0232766868 0.017317994 0.006666667 0.0232766868
## 14: cg26739327 0.0225452652 0.019750990 0.020000000 0.0225452652
## 15: cg24859648 0.0205537125 0.018232123 0.020000000 0.0205537125
## 16: cg03172493 0.0201033520 0.013037474 0.006666667 0.0201033520
## 17: cg04242342 0.0200935392 0.015739522 0.013333333 0.0200935392
## 18: cg20507276 0.0194201121 0.013447035 0.006666667 0.0194201121
## 19: cg02356645 0.0192671001 0.021801801 0.026666667 0.0192671001
## 20: cg06697310 0.0183776453 0.011809301 0.006666667 0.0183776453
## 21: cg23350716 0.0183192380 0.012031099 0.006666667 0.0183192380
## 22: cg04124201 0.0179779687 0.017675955 0.013333333 0.0179779687
## 23: cg12279734 0.0172886375 0.017388903 0.013333333 0.0172886375
## 24: cg09650803 0.0168243561 0.014182728 0.006666667 0.0168243561
## 25: cg14780448 0.0166389253 0.013071445 0.006666667 0.0166389253
## 26: cg02095601 0.0161032601 0.017860915 0.013333333 0.0161032601
## 27: cg26901661 0.0160348462 0.014722216 0.006666667 0.0160348462
## 28: cg22274273 0.0150648046 0.014456404 0.013333333 0.0150648046
## 29: cg18339359 0.0141749286 0.015958107 0.013333333 0.0141749286
## 30: cg22901347 0.0136977887 0.012988228 0.013333333 0.0136977887
## 31: cg26983017 0.0128410528 0.011567190 0.006666667 0.0128410528
## 32: cg10542624 0.0120089954 0.018173232 0.026666667 0.0120089954
## 33: cg02078724 0.0117461055 0.011838159 0.006666667 0.0117461055
## 34: cg18662228 0.0115825625 0.015172395 0.013333333 0.0115825625
## 35: cg23916408 0.0115229062 0.011007604 0.006666667 0.0115229062
## 36: cg20218135 0.0113103980 0.019119813 0.026666667 0.0113103980
## 37: cg03084184 0.0108557998 0.011920677 0.013333333 0.0108557998
## 38: cg07152869 0.0102537459 0.010539671 0.006666667 0.0102537459
## 39: cg14252149 0.0099782830 0.009382127 0.006666667 0.0099782830
## 40: cg00421199 0.0094152318 0.012024100 0.013333333 0.0094152318
## 41: cg04109990 0.0092667272 0.011758414 0.013333333 0.0092667272
## 42: cg12080266 0.0091577129 0.014683199 0.013333333 0.0091577129
## 43: cg22071943 0.0090437100 0.011133333 0.006666667 0.0090437100
## 44: cg15591384 0.0089969311 0.008407956 0.006666667 0.0089969311
## 45: cg06378561 0.0089793484 0.015592566 0.020000000 0.0089793484
## 46: cg12858518 0.0083743905 0.009217561 0.006666667 0.0083743905
## 47: cg05096415 0.0082810148 0.010516857 0.013333333 0.0082810148
## 48: cg03982462 0.0081355457 0.013222267 0.020000000 0.0081355457
## 49: cg16268937 0.0077561535 0.008879502 0.006666667 0.0077561535
## 50: cg02389264 0.0076490485 0.008077554 0.006666667 0.0076490485
## 51: cg26948066 0.0073978997 0.011846897 0.013333333 0.0073978997
## 52: cg02901522 0.0070238078 0.011895115 0.013333333 0.0070238078
## 53: cg05749243 0.0063892265 0.007413173 0.006666667 0.0063892265
## 54: cg19512141 0.0063873478 0.008107146 0.013333333 0.0063873478
## 55: cg12240569 0.0057874064 0.007176593 0.006666667 0.0057874064
## 56: cg10058204 0.0056188053 0.006990387 0.006666667 0.0056188053
## 57: cg01280698 0.0051669009 0.007154495 0.006666667 0.0051669009
## 58: cg11358878 0.0049745144 0.007083128 0.006666667 0.0049745144
## 59: cg27286614 0.0047093488 0.008143924 0.013333333 0.0047093488
## 60: cg11314779 0.0046231527 0.008929069 0.013333333 0.0046231527
## 61: cg12012426 0.0043119630 0.007432628 0.006666667 0.0043119630
## 62: cg00841008 0.0038918987 0.005864697 0.006666667 0.0038918987
## 63: cg10844498 0.0037614309 0.007263332 0.013333333 0.0037614309
## 64:        PC1 0.0034890547 0.008534827 0.020000000 0.0034890547
## 65: cg18029737 0.0034785539 0.005756951 0.006666667 0.0034785539
## 66: cg05841700 0.0032694977 0.005685044 0.013333333 0.0032694977
## 67: cg26757229 0.0030051628 0.004109165 0.006666667 0.0030051628
## 68: cg06870118 0.0029605472 0.005001528 0.006666667 0.0029605472
## 69: cg24065597 0.0029334326 0.004465203 0.006666667 0.0029334326
## 70: cg10701746 0.0028155873 0.005170580 0.006666667 0.0028155873
## 71: cg20913114 0.0027581827 0.005594966 0.013333333 0.0027581827
## 72: cg24851651 0.0025379934 0.004600551 0.006666667 0.0025379934
## 73: cg02494911 0.0025244571 0.006191726 0.013333333 0.0025244571
## 74: cg01680303 0.0025028687 0.004600741 0.006666667 0.0025028687
## 75: cg01130884 0.0023594602 0.003191772 0.006666667 0.0023594602
## 76: cg00322003 0.0023433962 0.003775580 0.006666667 0.0023433962
## 77: cg19503462 0.0021770481 0.005706923 0.013333333 0.0021770481
## 78: cg08584917 0.0019281533 0.004035198 0.006666667 0.0019281533
## 79: cg21986118 0.0019253875 0.003933209 0.006666667 0.0019253875
## 80: cg04218584 0.0019040799 0.005502655 0.013333333 0.0019040799
## 81: cg11787167 0.0018907717 0.004120082 0.006666667 0.0018907717
## 82: cg04798314 0.0016426706 0.003483939 0.006666667 0.0016426706
## 83: cg13080267 0.0014286738 0.003472139 0.006666667 0.0014286738
## 84: cg09584650 0.0013610219 0.003041222 0.006666667 0.0013610219
## 85: cg11706829 0.0012386731 0.003152206 0.006666667 0.0012386731
## 86: cg12466610 0.0012047381 0.002835023 0.006666667 0.0012047381
## 87: cg00648024 0.0011587124 0.002666322 0.006666667 0.0011587124
## 88: cg12434901 0.0011400026 0.002832174 0.006666667 0.0011400026
## 89: cg17296678 0.0009650143 0.002852442 0.006666667 0.0009650143
## 90: cg02217425 0.0009648252 0.002504687 0.006666667 0.0009648252
## 91: cg19555075 0.0009560607 0.002561946 0.006666667 0.0009560607
## 92: cg19248407 0.0008043038 0.001782882 0.006666667 0.0008043038
## 93: cg02489327 0.0006623852 0.002081222 0.006666667 0.0006623852
## 94: cg11835797 0.0005514064 0.001917455 0.006666667 0.0005514064
## 95: cg04867412 0.0004451990 0.001543768 0.006666667 0.0004451990
## 96: cg00675157 0.0004233330 0.001512818 0.006666667 0.0004233330
## 97: cg03115532 0.0003918513 0.001502284 0.006666667 0.0003918513
##        Feature         Gain       Cover   Frequency   Importance
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7587

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    modelTrain_xgb_AUC<-mean_auc
}
print(modelTrain_xgb_AUC)
## Area under the curve: 0.7587

5. Random Forest

5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 221 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.7014141  0.00000000
##   142   0.7058586  0.06388588
##   282   0.7014141  0.02249634
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 142.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.7028956
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       26
##   Dementia  0        2
##                                           
##                Accuracy : 0.7234          
##                  95% CI : (0.6215, 0.8107)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.3728          
##                                           
##                   Kappa : 0.0975          
##                                           
##  Mcnemar's Test P-Value : 9.443e-07       
##                                           
##             Sensitivity : 1.00000         
##             Specificity : 0.07143         
##          Pos Pred Value : 0.71739         
##          Neg Pred Value : 1.00000         
##              Prevalence : 0.70213         
##          Detection Rate : 0.70213         
##    Detection Prevalence : 0.97872         
##       Balanced Accuracy : 0.53571         
##                                           
##        'Positive' Class : CN              
## 
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
##  Accuracy 
## 0.7234043
print(cm_modelTrain_rf_Kappa)
##      Kappa 
## 0.09748892
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 282)
## 
##            Importance
## cg04124201     100.00
## cg12333628      98.82
## cg17296678      98.54
## cg24861747      96.80
## cg12012426      94.86
## cg06864789      90.56
## cg17419220      90.32
## cg12776173      90.07
## cg25174111      86.13
## cg17118775      83.72
## cg20507276      81.64
## cg12080266      80.08
## cg18339359      79.10
## cg21575308      78.18
## cg03167407      78.06
## cg05841700      77.68
## cg06264882      77.24
## cg03115532      76.90
## cg07951602      76.88
## cg00051154      76.65
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
##               CN     Dementia
## 1    2.385972379  2.385972379
## 2    2.333023115  2.333023115
## 3    2.320579943  2.320579943
## 4    2.242541742  2.242541742
## 5    2.155788154  2.155788154
## 6    1.962651415  1.962651415
## 7    1.952273697  1.952273697
## 8    1.940836248  1.940836248
## 9    1.764479439  1.764479439
## 10   1.656201594  1.656201594
## 11   1.563109255  1.563109255
## 12   1.493258089  1.493258089
## 13   1.449060805  1.449060805
## 14   1.407767151  1.407767151
## 15   1.402459914  1.402459914
## 16   1.385620641  1.385620641
## 17   1.365766136  1.365766136
## 18   1.350361230  1.350361230
## 19   1.349787268  1.349787268
## 20   1.339578778  1.339578778
## 21   1.325958597  1.325958597
## 22   1.318767171  1.318767171
## 23   1.294264109  1.294264109
## 24   1.289753138  1.289753138
## 25   1.278896697  1.278896697
## 26   1.247228677  1.247228677
## 27   1.204854709  1.204854709
## 28   1.174622985  1.174622985
## 29   1.172495522  1.172495522
## 30   1.164388626  1.164388626
## 31   1.154503608  1.154503608
## 32   1.152008544  1.152008544
## 33   1.133200853  1.133200853
## 34   1.124831434  1.124831434
## 35   1.115485458  1.115485458
## 36   1.104028820  1.104028820
## 37   1.103050151  1.103050151
## 38   1.101469480  1.101469480
## 39   1.084879603  1.084879603
## 40   1.082668621  1.082668621
## 41   1.051560308  1.051560308
## 42   1.046921128  1.046921128
## 43   1.046231188  1.046231188
## 44   1.032157521  1.032157521
## 45   1.026153927  1.026153927
## 46   1.024861550  1.024861550
## 47   1.023529192  1.023529192
## 48   1.012803713  1.012803713
## 49   1.009504868  1.009504868
## 50   1.005163295  1.005163295
## 51   0.959981040  0.959981040
## 52   0.955243601  0.955243601
## 53   0.943175256  0.943175256
## 54   0.927364824  0.927364824
## 55   0.925686171  0.925686171
## 56   0.923094468  0.923094468
## 57   0.911364097  0.911364097
## 58   0.908007721  0.908007721
## 59   0.898736216  0.898736216
## 60   0.885293076  0.885293076
## 61   0.871944597  0.871944597
## 62   0.871941241  0.871941241
## 63   0.851492471  0.851492471
## 64   0.849570997  0.849570997
## 65   0.840607814  0.840607814
## 66   0.833016923  0.833016923
## 67   0.826156673  0.826156673
## 68   0.816905966  0.816905966
## 69   0.811154689  0.811154689
## 70   0.748243106  0.748243106
## 71   0.747268363  0.747268363
## 72   0.739337584  0.739337584
## 73   0.716652066  0.716652066
## 74   0.706046750  0.706046750
## 75   0.701375042  0.701375042
## 76   0.699967134  0.699967134
## 77   0.690133541  0.690133541
## 78   0.683880552  0.683880552
## 79   0.682018577  0.682018577
## 80   0.655541763  0.655541763
## 81   0.653934706  0.653934706
## 82   0.621137099  0.621137099
## 83   0.613129716  0.613129716
## 84   0.570997675  0.570997675
## 85   0.533600128  0.533600128
## 86   0.525124889  0.525124889
## 87   0.514220086  0.514220086
## 88   0.503232153  0.503232153
## 89   0.499294900  0.499294900
## 90   0.484827103  0.484827103
## 91   0.476238519  0.476238519
## 92   0.473864528  0.473864528
## 93   0.472402736  0.472402736
## 94   0.428159786  0.428159786
## 95   0.423041675  0.423041675
## 96   0.419805430  0.419805430
## 97   0.418458386  0.418458386
## 98   0.405231348  0.405231348
## 99   0.390668677  0.390668677
## 100  0.388417966  0.388417966
## 101  0.382555283  0.382555283
## 102  0.380576069  0.380576069
## 103  0.377889843  0.377889843
## 104  0.376872580  0.376872580
## 105  0.370335874  0.370335874
## 106  0.369295792  0.369295792
## 107  0.365964061  0.365964061
## 108  0.352166154  0.352166154
## 109  0.335168984  0.335168984
## 110  0.332528963  0.332528963
## 111  0.330803954  0.330803954
## 112  0.327301968  0.327301968
## 113  0.316456797  0.316456797
## 114  0.299567999  0.299567999
## 115  0.289235612  0.289235612
## 116  0.283794600  0.283794600
## 117  0.283521994  0.283521994
## 118  0.282408697  0.282408697
## 119  0.266947588  0.266947588
## 120  0.256856425  0.256856425
## 121  0.255699400  0.255699400
## 122  0.248297533  0.248297533
## 123  0.236981683  0.236981683
## 124  0.208906562  0.208906562
## 125  0.205858683  0.205858683
## 126  0.201433721  0.201433721
## 127  0.198651167  0.198651167
## 128  0.188216600  0.188216600
## 129  0.166613534  0.166613534
## 130  0.164900584  0.164900584
## 131  0.164555915  0.164555915
## 132  0.155767440  0.155767440
## 133  0.121191075  0.121191075
## 134  0.113188286  0.113188286
## 135  0.100908089  0.100908089
## 136  0.089877150  0.089877150
## 137  0.083470093  0.083470093
## 138  0.080223699  0.080223699
## 139  0.068265434  0.068265434
## 140  0.066901034  0.066901034
## 141  0.052312115  0.052312115
## 142  0.034119362  0.034119362
## 143  0.033053616  0.033053616
## 144  0.031380973  0.031380973
## 145  0.019880275  0.019880275
## 146  0.018068047  0.018068047
## 147  0.018024803  0.018024803
## 148 -0.008728916 -0.008728916
## 149 -0.025347907 -0.025347907
## 150 -0.025773977 -0.025773977
## 151 -0.030624332 -0.030624332
## 152 -0.032772618 -0.032772618
## 153 -0.034442473 -0.034442473
## 154 -0.036174441 -0.036174441
## 155 -0.038424941 -0.038424941
## 156 -0.041031211 -0.041031211
## 157 -0.042740951 -0.042740951
## 158 -0.043809032 -0.043809032
## 159 -0.056015083 -0.056015083
## 160 -0.057133029 -0.057133029
## 161 -0.065036916 -0.065036916
## 162 -0.069980861 -0.069980861
## 163 -0.072956631 -0.072956631
## 164 -0.073449819 -0.073449819
## 165 -0.075319508 -0.075319508
## 166 -0.085399051 -0.085399051
## 167 -0.113624874 -0.113624874
## 168 -0.114048642 -0.114048642
## 169 -0.116122408 -0.116122408
## 170 -0.125810774 -0.125810774
## 171 -0.127942089 -0.127942089
## 172 -0.128502626 -0.128502626
## 173 -0.131347691 -0.131347691
## 174 -0.138837373 -0.138837373
## 175 -0.145388206 -0.145388206
## 176 -0.145435956 -0.145435956
## 177 -0.145471079 -0.145471079
## 178 -0.150845978 -0.150845978
## 179 -0.155492604 -0.155492604
## 180 -0.160816093 -0.160816093
## 181 -0.163534074 -0.163534074
## 182 -0.178054986 -0.178054986
## 183 -0.183645569 -0.183645569
## 184 -0.184973568 -0.184973568
## 185 -0.193017044 -0.193017044
## 186 -0.195222825 -0.195222825
## 187 -0.197357551 -0.197357551
## 188 -0.218739293 -0.218739293
## 189 -0.222868380 -0.222868380
## 190 -0.227086505 -0.227086505
## 191 -0.235030873 -0.235030873
## 192 -0.244660435 -0.244660435
## 193 -0.252214588 -0.252214588
## 194 -0.254775853 -0.254775853
## 195 -0.256274211 -0.256274211
## 196 -0.269163072 -0.269163072
## 197 -0.271928243 -0.271928243
## 198 -0.288815070 -0.288815070
## 199 -0.297404700 -0.297404700
## 200 -0.309147945 -0.309147945
## 201 -0.315004402 -0.315004402
## 202 -0.321788288 -0.321788288
## 203 -0.335613541 -0.335613541
## 204 -0.340330008 -0.340330008
## 205 -0.361930409 -0.361930409
## 206 -0.380239893 -0.380239893
## 207 -0.409433503 -0.409433503
## 208 -0.415115666 -0.415115666
## 209 -0.423574111 -0.423574111
## 210 -0.425518630 -0.425518630
## 211 -0.437977576 -0.437977576
## 212 -0.438961547 -0.438961547
## 213 -0.446434967 -0.446434967
## 214 -0.448757674 -0.448757674
## 215 -0.466068858 -0.466068858
## 216 -0.467041913 -0.467041913
## 217 -0.469767092 -0.469767092
## 218 -0.474581065 -0.474581065
## 219 -0.483903598 -0.483903598
## 220 -0.493021594 -0.493021594
## 221 -0.515668587 -0.515668587
## 222 -0.527422785 -0.527422785
## 223 -0.528220507 -0.528220507
## 224 -0.554238228 -0.554238228
## 225 -0.556121620 -0.556121620
## 226 -0.564636872 -0.564636872
## 227 -0.582951353 -0.582951353
## 228 -0.602410703 -0.602410703
## 229 -0.603265508 -0.603265508
## 230 -0.604065829 -0.604065829
## 231 -0.618728895 -0.618728895
## 232 -0.629737526 -0.629737526
## 233 -0.634359732 -0.634359732
## 234 -0.678932293 -0.678932293
## 235 -0.684534952 -0.684534952
## 236 -0.702283698 -0.702283698
## 237 -0.715622941 -0.715622941
## 238 -0.740607612 -0.740607612
## 239 -0.748113195 -0.748113195
## 240 -0.755084027 -0.755084027
## 241 -0.760824489 -0.760824489
## 242 -0.777921697 -0.777921697
## 243 -0.800862194 -0.800862194
## 244 -0.810048339 -0.810048339
## 245 -0.815284988 -0.815284988
## 246 -0.822005545 -0.822005545
## 247 -0.841431749 -0.841431749
## 248 -0.845560196 -0.845560196
## 249 -0.857440081 -0.857440081
## 250 -0.858205101 -0.858205101
## 251 -0.860788709 -0.860788709
## 252 -0.879955361 -0.879955361
## 253 -0.910852969 -0.910852969
## 254 -0.962007080 -0.962007080
## 255 -0.984377311 -0.984377311
## 256 -0.985963608 -0.985963608
## 257 -0.995321640 -0.995321640
## 258 -1.013764415 -1.013764415
## 259 -1.051366831 -1.051366831
## 260 -1.064831782 -1.064831782
## 261 -1.087154280 -1.087154280
## 262 -1.095842229 -1.095842229
## 263 -1.101856134 -1.101856134
## 264 -1.106004894 -1.106004894
## 265 -1.113965579 -1.113965579
## 266 -1.136249038 -1.136249038
## 267 -1.175621891 -1.175621891
## 268 -1.232567135 -1.232567135
## 269 -1.241186001 -1.241186001
## 270 -1.257632197 -1.257632197
## 271 -1.293785143 -1.293785143
## 272 -1.335850527 -1.335850527
## 273 -1.339167409 -1.339167409
## 274 -1.371977761 -1.371977761
## 275 -1.396523359 -1.396523359
## 276 -1.490705995 -1.490705995
## 277 -1.670801563 -1.670801563
## 278 -1.671115084 -1.671115084
## 279 -1.729323780 -1.729323780
## 280 -1.783551657 -1.783551657
## 281 -1.858854665 -1.858854665
## 282 -2.096251798 -2.096251798
if(METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.7927

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_rf_AUC <- mean_auc
}
print(modelTrain_rf_AUC)
## Area under the curve: 0.7927

6. SVM

6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 221 samples
## 282 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 177, 177, 176, 177 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.9003030  0.7808150
##   0.50  0.9139394  0.8102123
##   1.00  0.9139394  0.8102123
## 
## Tuning parameter 'sigma' was held constant at a value of 0.00178696
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.00178696 and C = 0.5.
print(svm_model$bestTune)
##        sigma   C
## 2 0.00178696 0.5
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.9093939
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.9093939
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.986425339366516"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9864253
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       58        2
##   Dementia  8       26
##                                          
##                Accuracy : 0.8936         
##                  95% CI : (0.813, 0.9478)
##     No Information Rate : 0.7021         
##     P-Value [Acc > NIR] : 8.516e-06      
##                                          
##                   Kappa : 0.7604         
##                                          
##  Mcnemar's Test P-Value : 0.1138         
##                                          
##             Sensitivity : 0.8788         
##             Specificity : 0.9286         
##          Pos Pred Value : 0.9667         
##          Neg Pred Value : 0.7647         
##              Prevalence : 0.7021         
##          Detection Rate : 0.6170         
##    Detection Prevalence : 0.6383         
##       Balanced Accuracy : 0.9037         
##                                          
##        'Positive' Class : CN             
## 
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
## Accuracy 
## 0.893617
print(cm_modelTrain_svm_Kappa)
##     Kappa 
## 0.7604485

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 315 rows and 283 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg01680303     1.0153846   1.153846      1.215385        0.04761905
## 2        PC1     1.0000000   1.076923      1.076923        0.04444444
## 3 cg11834635     1.0769231   1.076923      1.076923        0.04444444
## 4 cg12012426     1.0000000   1.076923      1.076923        0.04444444
## 5 cg20218135     0.9230769   1.076923      1.076923        0.04444444
## 6 cg18662228     1.0153846   1.076923      1.138462        0.04444444
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (test_data_SVM1$DX Dementia) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9821
## [1] "The auc vlue is:"
## Area under the curve: 0.9821

if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_svm_AUC <- mean_auc
}

7. Important Features

7.0 Choose Number of Top Features

# GOTO "INPUT" Session to set the Number of common features needed

NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET

7.1 Merge Important Features

The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.

So, let’s considering scale the feature to make them in the same range.

First, Let’s process with each data frame to ensure they have consistent format.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"

head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"

head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df

if (METHOD_FEATURE_FLAG_NUM == 3){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 4){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}



if (METHOD_FEATURE_FLAG_NUM == 5){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 6){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}


head(importance_rf_model_df_processed)


}

From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.

If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.

if(METHOD_FEATURE_FLAG == 1){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))

colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)

colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"


head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df
  
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
  
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"

head(importance_rf_model_df_processed)

}

Then, Let’s do scaling, here we choose min-max scaling.

importance_list <- list(logistic = importance_model_LRM1_df_processed, 
                        xgb = importance_xgb_model_df_processed, 
                        elastic_net = importance_elastic_net_model1_df_processed, 
                        rf = importance_rf_model_df_processed, 
                        svm = importance_SVM_df_processed)


min_max_scale_Imp<-function(df){
  x<-df[, grepl("Importance_", colnames(df))]
  df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
  return(df)
}

for (i in seq_along(importance_list)) {
    importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}


# Print each data frame after scaling
print(head(importance_list[[1]]))
##            Importance_LRM1    Feature
## age.now        0.001866599    age.now
## PC1            1.000000000        PC1
## PC2            0.517307312        PC2
## PC3            0.003292518        PC3
## cg02483977     0.000000000 cg02483977
## cg17348244     0.000000000 cg17348244
print(head(importance_list[[2]]))
##            Importance_XGB    Feature
## cg23836570      1.0000000 cg23836570
## cg06864789      0.9470049 cg06864789
## cg01013522      0.9395130 cg01013522
## cg24861747      0.9163772 cg24861747
## cg23698271      0.8988937 cg23698271
## cg00999469      0.8972485 cg00999469
print(head(importance_list[[3]]))
##            Importance_ENM1    Feature
## age.now         0.00000000    age.now
## PC1             1.00000000        PC1
## PC2             0.55080635        PC2
## PC3             0.69092669        PC3
## cg02483977      0.06391268 cg02483977
## cg17348244      0.08187197 cg17348244
print(head(importance_list[[4]]))
##            Importance_RF    Feature
## age.now        0.6139348    age.now
## PC1            0.6044726        PC1
## PC2            0.4013291        PC2
## PC3            0.5230772        PC3
## cg02483977     0.2644667 cg02483977
## cg17348244     0.5734020 cg17348244
print(head(importance_list[[5]]))
##   Importance_SVM    Feature
## 1            1.0 cg01680303
## 2            0.8        PC1
## 3            0.8 cg11834635
## 4            0.8 cg12012426
## 5            0.8 cg20218135
## 6            0.8 cg18662228

Now, Let’s merge the data frames of scaled feature importance.

# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)

head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0

# Exclude DX, as it's label

combined_importance <- combined_importance %>% 
  filter(Feature != "DX")

# View the filtered dataframe
head(combined_importance)

7.2 View the Important Features

7.2.1 Select Based on AVG

If select the TOP Number of important features based on average importance. (See the following)

combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])

head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]

head(combined_importance_Avg_ordered)
# Top Number of common important features

print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 280        PC1      1.00000000      0.1064562      1.00000000     0.6044726            0.8          0.7021858
## 256 cg24861747      0.16927233      0.9163772      0.29566395     0.9680001            0.8          0.6298627
## 89  cg06864789      0.23988367      0.9470049      0.33511358     0.9055556            0.6          0.6055116
## 13  cg01013522      0.24921505      0.9395130      0.35428316     0.6198419            0.8          0.5925706
## 53  cg04124201      0.22992632      0.5485345      0.34334351     1.0000000            0.6          0.5443609
## 181 cg15775217      0.21010245      0.7102063      0.32921469     0.5048532            0.8          0.5108753
## 272 cg27114706      0.21247515      0.8491637      0.30752750     0.7530075            0.4          0.5044348
## 244 cg23698271      0.16220028      0.8988937      0.30298027     0.5096729            0.6          0.4947494
## 257 cg25174111      0.06875761      0.7147864      0.18923308     0.8613427            0.6          0.4868240
## 24  cg02356645      0.27948204      0.5878679      0.32959632     0.6062590            0.6          0.4806411
## 172 cg14780448      0.23794700      0.5076783      0.30996839     0.7364885            0.6          0.4784164
## 266 cg26739327      0.24314999      0.6878896      0.34751982     0.4947193            0.6          0.4746557
## 246 cg23836570      0.00000000      1.0000000      0.09745053     0.6486526            0.6          0.4692206
## 20  cg02078724      0.19277821      0.3583911      0.24857296     0.7252550            0.8          0.4649995
## 209 cg18037388      0.10501897      0.8169682      0.25105265     0.7297437            0.4          0.4605567
## 225 cg20507276      0.19044849      0.5925365      0.29999137     0.8164163            0.4          0.4598785
## 139 cg12080266      0.29691142      0.2794154      0.31192646     0.8008323            0.6          0.4578171
## 12  cg00999469      0.08343347      0.8972485      0.16138164     0.5247286            0.6          0.4533584
## 93  cg07152869      0.27045247      0.3128570      0.42814230     0.6202573            0.6          0.4463418
## 211 cg18339359      0.15671520      0.4324981      0.24943531     0.7909717            0.6          0.4459241
# Top Number of common important features' name

top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature

print(top_Num_combined_importance_Avg_ordered_Nam)
##  [1] "PC1"        "cg24861747" "cg06864789" "cg01013522" "cg04124201" "cg15775217" "cg27114706" "cg23698271" "cg25174111" "cg02356645" "cg14780448" "cg26739327" "cg23836570" "cg02078724" "cg18037388"
## [16] "cg20507276" "cg12080266" "cg00999469" "cg07152869" "cg18339359"

Visualization with bar plot for the feature average importance

ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() +  # Flip coordinates to make it horizontal
  labs(title = "Feature Importance Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Visualization with bar plot for the top feature average importance

ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

7.2.2 Select Based on Quantile

The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)

Let’s create the new data frame with different quantiles of feature importance for each models.

And order by the 50% quantile from high to low, select top features based on that.

quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))

combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)

combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)

combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)

combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)

combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)

# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]


head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
##        Feature         0%        25%       50%       75%      100%
## 256 cg24861747 0.16927233 0.29566395 0.8000000 0.9163772 0.9680001
## 280        PC1 0.10645624 0.60447256 0.8000000 1.0000000 1.0000000
## 13  cg01013522 0.24921505 0.35428316 0.6198419 0.8000000 0.9395130
## 55  cg04242342 0.10714832 0.18656325 0.6000000 0.6130838 0.6781069
## 89  cg06864789 0.23988367 0.33511358 0.6000000 0.9055556 0.9470049
## 246 cg23836570 0.00000000 0.09745053 0.6000000 0.6486526 1.0000000
## 257 cg25174111 0.06875761 0.18923308 0.6000000 0.7147864 0.8613427
## 24  cg02356645 0.27948204 0.32959632 0.5878679 0.6000000 0.6062590
## 53  cg04124201 0.22992632 0.34334351 0.5485345 0.6000000 1.0000000
## 12  cg00999469 0.08343347 0.16138164 0.5247286 0.6000000 0.8972485
## 281        PC2 0.00000000 0.40132912 0.5173073 0.5508064 0.6000000
## 244 cg23698271 0.16220028 0.30298027 0.5096729 0.6000000 0.8988937
## 172 cg14780448 0.23794700 0.30996839 0.5076783 0.6000000 0.7364885
## 181 cg15775217 0.21010245 0.32921469 0.5048532 0.7102063 0.8000000
## 266 cg26739327 0.24314999 0.34751982 0.4947193 0.6000000 0.6878896
## 143 cg12279734 0.08015036 0.20968081 0.4929339 0.5275020 0.6000000
## 237 cg22274273 0.15609202 0.20723822 0.4596496 0.5382838 0.6000000
## 189 cg16390578 0.11075316 0.16638133 0.4551840 0.6000000 0.8588247
## 109 cg09650803 0.16424736 0.17504495 0.4396123 0.5133361 0.6000000
## 211 cg18339359 0.15671520 0.24943531 0.4324981 0.6000000 0.7909717
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
##  [1] "cg24861747" "PC1"        "cg01013522" "cg04242342" "cg06864789" "cg23836570" "cg25174111" "cg02356645" "cg04124201" "cg00999469" "PC2"        "cg23698271" "cg14780448" "cg15775217" "cg26739327"
## [16] "cg12279734" "cg22274273" "cg16390578" "cg09650803" "cg18339359"

Visualization with the box plot.

library(tidyr)

long_df <- pivot_longer(combined_importance_quantiles, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +  
  labs(title = "Distribution of Feature Importances",
       x = "Feature",
       y = "Importance") +
  theme_minimal()


Visualization with top features with box plot.

library(tidyr)

long_df <- pivot_longer(top_Num_median_features_imp, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +
  labs(
    title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
       x = "Feature",
       y = "Importance") +
  theme_minimal()

7.2.3 Select Based on Frequency/Common

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features (say 40) for each model (This number is set to “NUM_COMMON_FEATURES_SET_Frequency” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

feature_df <- as.data.frame(feature_matrix)

print(head(feature_df))
##            LRM XGB ENM RF SVM
## PC1          1   0   1  0   1
## PC2          1   0   1  0   0
## cg02872767   1   0   1  0   0
## cg11787167   1   0   1  0   0
## cg09216282   1   0   1  0   0
## cg01680303   1   0   1  0   1

For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.

feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
##            LRM XGB ENM RF SVM Total_Count
## cg01013522   1   1   1  0   1           4
## cg06864789   1   1   1  1   0           4
## PC1          1   0   1  0   1           3
## cg01680303   1   0   1  0   1           3
## cg02356645   1   1   1  0   0           3
## cg26739327   1   1   1  0   0           3
## cg24861747   0   1   0  1   1           3
## cg15775217   0   1   1  0   1           3
## PC2          1   0   1  0   0           2
## cg02872767   1   0   1  0   0           2
## cg11787167   1   0   1  0   0           2
## cg09216282   1   0   1  0   0           2
## cg12080266   1   0   0  1   0           2
## cg19503462   1   0   1  0   0           2
## cg07152869   1   0   1  0   0           2
## cg12858518   1   0   1  0   0           2
## cg26757229   1   0   1  0   0           2
## cg25174111   0   1   0  1   0           2
## cg20507276   0   1   0  1   0           2
## cg04124201   0   0   1  1   0           2
## cg12012426   0   0   0  1   1           2
## cg17419220   0   0   0  1   1           2
## cg12776173   0   0   0  1   1           2
## cg06378561   1   0   0  0   0           1
## cg12108278   1   0   0  0   0           1
## cg03084184   1   0   0  0   0           1
## cg14780448   1   0   0  0   0           1
## cg02932958   1   0   0  0   0           1
## cg23836570   0   1   0  0   0           1
## cg23698271   0   1   0  0   0           1
## cg00999469   0   1   0  0   0           1
## cg16390578   0   1   0  0   0           1
## cg13885788   0   1   0  0   0           1
## cg27114706   0   1   0  0   0           1
## cg25561557   0   1   0  0   0           1
## cg18037388   0   1   0  0   0           1
## cg24859648   0   1   0  0   0           1
## cg03172493   0   1   0  0   0           1
## cg04242342   0   1   0  0   0           1
## cg06697310   0   1   0  0   0           1
## PC3          0   0   1  0   0           1
## cg04109990   0   0   1  0   0           1
## cg03982462   0   0   1  0   0           1
## cg06870118   0   0   1  0   0           1
## cg12333628   0   0   0  1   0           1
## cg17296678   0   0   0  1   0           1
## cg17118775   0   0   0  1   0           1
## cg18339359   0   0   0  1   0           1
## cg21575308   0   0   0  1   0           1
## cg03167407   0   0   0  1   0           1
## cg05841700   0   0   0  1   0           1
## cg06264882   0   0   0  1   0           1
## cg03115532   0   0   0  1   0           1
## cg07951602   0   0   0  1   0           1
## cg00051154   0   0   0  1   0           1
## cg02078724   0   0   0  0   1           1
## cg20218135   0   0   0  0   1           1
## cg18662228   0   0   0  0   1           1
## cg07584620   0   0   0  0   1           1
## cg11314779   0   0   0  0   1           1
## cg10058204   0   0   0  0   1           1
## cg11358878   0   0   0  0   1           1
## cg10701746   0   0   0  0   1           1
## cg27341708   0   0   0  0   1           1
## cg04867412   0   0   0  0   1           1
## cg04771146   0   0   0  0   1           1
## cg02901522   0   0   0  0   1           1

Combine with the importance data frame

all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0


# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)

print(head(feature_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   0   0   0  0   0           0
## 2 cg00051154   0   0   0  1   0           1
## 3 cg00156497   0   0   0  0   0           0
## 4 cg00322003   0   0   0  0   0           0
## 5 cg00332268   0   0   0  0   0           0
## 6 cg00421199   0   0   0  0   0           0
print(head(all_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now     0.001866599      0.0000000       0.0000000     0.6139348            0.4          0.2031603
## 2 cg00051154     0.008799961      0.0000000       0.1421357     0.7665459            0.4          0.2634963
## 3 cg00156497     0.041645211      0.0000000       0.1365957     0.6326300            0.6          0.2821742
## 4 cg00322003     0.213499057      0.0715005       0.3143059     0.5848384            0.4          0.3168288
## 5 cg00332268     0.071615682      0.0000000       0.1245249     0.5517628            0.6          0.2695807
## 6 cg00421199     0.178414680      0.2872727       0.2686122     0.4599969            0.4          0.3188593
print(head(all_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now   0   0   0  0   0           0     0.001866599      0.0000000       0.0000000     0.6139348            0.4          0.2031603
## 2 cg00051154   0   0   0  1   0           1     0.008799961      0.0000000       0.1421357     0.7665459            0.4          0.2634963
## 3 cg00156497   0   0   0  0   0           0     0.041645211      0.0000000       0.1365957     0.6326300            0.6          0.2821742
## 4 cg00322003   0   0   0  0   0           0     0.213499057      0.0715005       0.3143059     0.5848384            0.4          0.3168288
## 5 cg00332268   0   0   0  0   0           0     0.071615682      0.0000000       0.1245249     0.5517628            0.6          0.2695807
## 6 cg00421199   0   0   0  0   0           0     0.178414680      0.2872727       0.2686122     0.4599969            0.4          0.3188593

Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 8"
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
print(df_process_mutual_FeatureName)
## [1] "cg01013522" "cg06864789" "PC1"        "cg01680303" "cg02356645" "cg26739327" "cg24861747" "cg15775217"

Importance of these features:

Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
    combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]

print(Top_Frequency_Feature_importance)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 280        PC1       1.0000000     0.10645624       1.0000000     0.6044726            0.8          0.7021858
## 256 cg24861747       0.1692723     0.91637723       0.2956640     0.9680001            0.8          0.6298627
## 89  cg06864789       0.2398837     0.94700495       0.3351136     0.9055556            0.6          0.6055116
## 13  cg01013522       0.2492151     0.93951297       0.3542832     0.6198419            0.8          0.5925706
## 181 cg15775217       0.2101024     0.71020632       0.3292147     0.5048532            0.8          0.5108753
## 24  cg02356645       0.2794820     0.58786787       0.3295963     0.6062590            0.6          0.4806411
## 266 cg26739327       0.2431500     0.68788955       0.3475198     0.4947193            0.6          0.4746557
## 18  cg01680303       0.3099023     0.07636624       0.3317030     0.4608488            1.0          0.4357641
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Important feature based on frequency but not in Average

# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method

all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] FALSE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## [1] "cg01680303"

SAVE AS RDATA -MAY NOT NEEDED

Overview of the Data Frame Variables.

Phenotype Part Data frame : “phenoticPart_RAW”

RAW Merged Data frame : “merged_df_raw”

Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”

Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”

Ordered Feature Frequency / Common Data Frame:

  • “frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.

  • “feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

  • “all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

head(phenoticPart_RAW)
# 
# save(NUM_COMMON_FEATURES,
#      combined_importance_quantiles,
#      combined_importance_Avg_ordered,
#      frequency_feature_df_RAW_ordered,
#      top_Num_median_features_Name,
#      top_Num_combined_importance_Avg_ordered_Nam,
#      file = "Part2_V8_08_top_features_5KCpGs.RData")
# 
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
# 
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")

8. Feature Selection and Output

8.0 Input - Number of Top Features and Method Choose.

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
Number_fea_input <- INPUT_NUMBER_FEATURES

Flag_8mean <- INPUT_Method_Mean_Choose 
Flag_8median <- INPUT_Method_Median_Choose 
Flag_8Fequency <- INPUT_Method_Frequency_Choose 
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE

8.1 Selected For Output

Based on Mean

selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 280        PC1       1.0000000      0.1064562       1.0000000     0.6044726            0.8          0.7021858
## 256 cg24861747       0.1692723      0.9163772       0.2956640     0.9680001            0.8          0.6298627
## 89  cg06864789       0.2398837      0.9470049       0.3351136     0.9055556            0.6          0.6055116
## 13  cg01013522       0.2492151      0.9395130       0.3542832     0.6198419            0.8          0.5925706
## 53  cg04124201       0.2299263      0.5485345       0.3433435     1.0000000            0.6          0.5443609
## 181 cg15775217       0.2101024      0.7102063       0.3292147     0.5048532            0.8          0.5108753
print(dim(selected_impAvg_ordered))
## [1] 250   7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature

print(head(selected_impAvg_ordered_NAME))
## [1] "PC1"        "cg24861747" "cg06864789" "cg01013522" "cg04124201" "cg15775217"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
##                           DX          PC1 cg24861747 cg06864789 cg01013522 cg04124201 cg15775217 cg27114706 cg23698271 cg25174111 cg02356645 cg14780448 cg26739327 cg23836570 cg02078724 cg18037388
## 200223270003_R03C01       CN -0.172761185  0.4309505  0.4605312  0.8862821  0.3308589  0.9168327  0.9359259  0.9109565  0.8573844  0.5833923 0.67021018  0.7693268 0.54259383  0.2896133  0.7545086
## 200223270003_R06C01       CN -0.003667305  0.8071462  0.8751365  0.5425308  0.3241613  0.6042521  0.9285384  0.9051701  0.2567745  0.5701428 0.62073547  0.8727608 0.03267304  0.2805612  0.7294565
## 200223270003_R07C01 Dementia -0.186779607  0.3347317  0.4902033  0.8429862  0.4332693  0.9062231  0.4787397  0.8804362  0.1903803  0.5683381 0.04425741  0.8340445 0.59939745  0.2739571  0.2391659
##                     cg20507276 cg12080266 cg00999469 cg07152869 cg18339359 cg16390578 cg04242342 cg01680303 cg20218135 cg25561557 cg03172493         PC2 cg19503462 cg13885788 cg18662228 cg22274273
## 200223270003_R03C01 0.38721972  0.9450629  0.2857719   0.505063  0.9040272 0.20983422  0.8167892  0.1344941 0.64278153 0.03851635 0.63362492  0.05745834  0.4537684  0.9369476  0.8730153  0.4246379
## 200223270003_R06C01 0.47978438  0.9363381  0.2499229   0.835249  0.8552121 0.06389068  0.8040357  0.7573869 0.06509247 0.47259480 0.06148804  0.08372861  0.6997359  0.5163017  0.8602464  0.4196796
## 200223270003_R07C01 0.02261996  0.6398247  0.2819622   0.519430  0.3073106 0.23101450  0.8286115  0.4772204 0.65642359 0.43364249 0.64562298 -0.01117250  0.7189778  0.9183376  0.8683578  0.4164100
##                     cg06697310 cg05096415 cg07584620 cg03084184 cg11314779 cg12279734 cg06378561 cg12012426 cg17419220 cg09650803 cg10058204 cg04109990 cg12776173 cg05841700 cg11358878 cg10542624
## 200223270003_R03C01  0.8653044  0.5177819  0.3763980  0.7877128  0.8966100  0.1494651  0.9377503  0.9434768 0.43470227  0.8954464  0.5834496  0.6476604  0.8730635  0.9146488 0.83252951 0.02189577
## 200223270003_R06C01  0.2405168  0.6288426  0.8530961  0.4546397  0.8908661  0.8760759  0.5154019  0.9220044 0.02781411  0.9113477  0.0549494  0.6692040  0.7009491  0.3737990 0.87521203 0.54330620
## 200223270003_R07C01  0.8479193  0.6060271  0.3888623  0.7812413  0.9048316  0.8674214  0.9403569  0.9241284 0.42803809  0.2518414  0.5689591  0.9024920  0.1136716  0.5046468 0.08917903 0.54991492
##                     cg06870118 cg14252149 cg10701746 cg11787167 cg23916408 cg05749243  cg22901347 cg03982462 cg12858518 cg17044529 cg09216282 cg24859648 cg02217425 cg24065597 cg17118775 cg15591384
## 200223270003_R03C01  0.8100144 0.02450779  0.4868342 0.04673831  0.9154993  0.9209685 0.001690332  0.6023731  0.9285252  0.9117895  0.9244259 0.44392797  0.1032503  0.2221098  0.5585676  0.7870275
## 200223270003_R06C01  0.7802055 0.02382413  0.4927257 0.32564508  0.8886255  0.9143061 0.103413834  0.8778458  0.9017533  0.9290636  0.9263996 0.03341185  0.6592850  0.7036129  0.2916054  0.7429614
## 200223270003_R07C01  0.7917257 0.56212480  0.8552180 0.43162543  0.8872447  0.9121180 0.632991482  0.8860227  0.9187879  0.9402858  0.9352308 0.43582347  0.8792021  0.2407676  0.2868948  0.8346279
##                     cg17296678 cg26901661 cg27341708 cg26983017 cg12306781 cg09584650 cg24851651 cg04867412          PC3 cg04771146 cg23350716 cg05373298 cg02901522 cg02095601 cg02872767 cg19555075
## 200223270003_R03C01  0.5653917  0.8754981 0.02613847 0.03145466  0.8663817 0.09661586 0.05358297  0.8796800  0.005055871  0.7648566  0.7876873 0.02652391  0.9372901  0.9161259  0.3886537  0.4921409
## 200223270003_R06C01  0.5272971  0.9021064 0.86893582 0.84677625  0.8027798 0.52399749 0.05968923  0.4497115  0.029143653  0.3125007  0.6960544 0.83538124  0.4954978  0.2233062  0.9099575  0.4261618
## 200223270003_R07C01  0.7661613  0.8556831 0.02642300 0.53922255  0.8787250 0.11587211 0.60864179  0.4445373 -0.032302430  0.2909958  0.7387498 0.89506024  0.9381188  0.8978191  0.8603283  0.4694729
##                     cg00421199 cg00322003 cg11716267 cg18526121 cg03392100 cg22681945 cg11834635 cg12074150 cg13226272 cg26948066 cg07456472 cg22653957 cg02389264 cg12471283 cg07138269 cg02656016
## 200223270003_R03C01  0.8532461  0.5702070 0.04959702  0.4762313  0.9227394  0.8388195  0.8880887 0.18602738  0.5410002  0.5026045  0.5856904  0.6442184  0.7900942  0.8658731  0.9426707  0.2355680
## 200223270003_R06C01  0.8891803  0.3077122 0.49143010  0.4833367  0.8902340  0.8700500  0.2493491 0.14231506  0.4437070  0.9101976  0.3886482  0.9531308  0.7789974  0.6963410  0.5057781  0.9052318
## 200223270003_R07C01  0.8937751  0.6104341 0.45857830  0.7761450  0.4359657  0.3344105  0.2210428 0.09201303  0.0265215  0.9379543  0.9186405  0.6534542  0.4174463  0.6680611  0.9400527  0.8653682
##                     cg16268937 cg10507965 cg16715186 cg02627240 cg24104387 cg12333628 cg12689021 cg03167407 cg17623720 cg25758034 cg18821122 cg03115532 cg09247979 cg08584917 cg13080267 cg04218584
## 200223270003_R03C01  0.8931712  0.4010973  0.7946153 0.57129408  0.5339034  0.9092861  0.7449475  0.7610292  0.8988624  0.6649219  0.5901603  0.8659608  0.5706177  0.9019732 0.78371483  0.8971263
## 200223270003_R06C01  0.9034556  0.4033691  0.8124316 0.05309659  0.3007614  0.5084647  0.7872237  0.3087606  0.8172384  0.2393844  0.5779620  0.8533871  0.5090215  0.9187789 0.09436069  0.8491768
## 200223270003_R07C01  0.8928450  0.3869543  0.7773263 0.52179136  0.7509780  0.5229394  0.7523141  0.2455453  0.8226085  0.7071501  0.9251431  0.4416574  0.5066661  0.6007449 0.09351259  0.9008137
##                     cg27452255 cg01280698 cg08242313 cg26007606 cg04831745 cg16089727 cg12240569 cg14924512 cg03327352 cg03187614 cg06012621 cg11109139 cg02932958 cg04467639 cg21575308 cg04664583
## 200223270003_R03C01  0.6593379 0.88462009  0.8953645  0.5615550 0.71214149 0.54996692 0.02690547  0.9160885  0.8786878  0.8826518  0.8579519  0.6350109  0.4210489  0.6400206 0.44702405  0.5881190
## 200223270003_R06C01  0.9012217 0.88471320  0.8573493  0.1463111 0.06871768 0.05876736 0.46030640  0.9088414  0.3042310  0.5131472  0.5325037  0.6904482  0.3825995  0.5657041 0.44792570  0.9352717
## 200223270003_R07C01  0.8898635 0.06370005  0.8992114  0.8101800 0.90994644 0.85485461 0.86185839  0.9081681  0.8273211  0.5281030  0.6263080  0.6274025  0.7617081  0.6302917 0.02822675  0.9350230
##                     cg02495179 cg14764203 cg17906851 cg19512141 cg00156497 cg16361249 cg00977253 cg21757617 cg15700429 cg07951602 cg16338321 cg05377703 cg11227702 cg05161773 cg02489327 cg23432430
## 200223270003_R03C01  0.7373055  0.4683709  0.9529718  0.7903543  0.5194653 0.52843073  0.9145988  0.4429909  0.9114530  0.8766842  0.8294062  0.8213047 0.49184121  0.4154907  0.8616312  0.9455418
## 200223270003_R06C01  0.5588114  0.8916566  0.6462151  0.8404684  0.9024063 0.09039669  0.8944518  0.4472538  0.8838233  0.8918089  0.4918708  0.5152514 0.02543724  0.8526849  0.8777949  0.9418716
## 200223270003_R07C01  0.5273309  0.8714472  0.9553497  0.2202759  0.9067989 0.42039062  0.9150206  0.4339315  0.9095363  0.8706938  0.5245645  0.7773036 0.45150971  0.4259275  0.4205073  0.9426559
##                     cg14181112 cg27187580 cg12108278 cg21533482 cg02981548 cg11173002 cg10786572 cg20913114 cg02302183 cg00332268 cg03359067  cg03088219 cg26889118 cg00051154 cg16536985 cg06264882
## 200223270003_R03C01  0.1615405  0.6643576  0.9243869  0.8288469  0.5220037  0.5913599  0.5982086 0.80382984  0.9191148  0.9044887  0.8628564 0.007435243  0.9154836 0.08370609  0.5418687 0.43678655
## 200223270003_R06C01  0.3424621  0.6914924  0.9068995  0.6766373  0.5098965  0.1878736  0.0935115 0.03158439  0.8749250  0.5777209  0.8144536 0.120155222  0.9101336 0.61288950  0.8392044 0.43703442
## 200223270003_R07C01  0.2178314  0.9357074  0.9131367  0.6235932  0.5660985  0.5150840  0.8436837 0.81256840  0.8888247  0.5848006  0.8737908 0.826554308  0.5759967 0.07638127  0.8822891 0.02439581
##                     cg21986118 cg04798314 cg23813394 cg16310958 cg15730644 cg05351360 cg11835797 cg00841008 cg12284872 cg14465143 cg00675157 cg17348244 cg07304760 cg06624143 cg26089705 cg12702014
## 200223270003_R03C01  0.6571296 0.07119798 0.48811365  0.9300073  0.4353906 0.03855181  0.9007408 0.61899333  0.7414569  0.5543068  0.9242325 0.81793075  0.5798534  0.4899758 0.50810373  0.7848681
## 200223270003_R06C01  0.7034445 0.09248843 0.02943436  0.9228871  0.8763048 0.76395533  0.8944957 0.05401588  0.7725267  0.2702875  0.9254708 0.07241099  0.5575516  0.9107688 0.03322136  0.8065993
## 200223270003_R07C01  0.9055894 0.06972566 0.92935625  0.8539019  0.4833709 0.77000888  0.8168544 0.90769205  0.7573369  0.2621492  0.5447244 0.78025001  0.9195617  0.9217350 0.03118009  0.7458594
##                     cg04033559 cg21501207 cg14904299 cg03057303 cg12213037 cg22071943 cg17429539 cg21578644 cg24422984 cg13799572 cg12556569 cg12421087 cg27286614 cg07971231 cg16733676 cg27224751
## 200223270003_R03C01  0.8768243  0.6813712  0.2712472  0.8923039   0.248785  0.2442648  0.7100923  0.9260863  0.5462594  0.8449584 0.03924599  0.5399655  0.5933858  0.8406145  0.8904541 0.03214912
## 200223270003_R06C01  0.8257388  0.4747229  0.8364544  0.4954311   0.812695  0.2644581  0.7660838  0.9159726  0.5193121  0.4409219 0.48636893  0.5400348  0.6348795  0.8447914  0.1698111 0.83123722
## 200223270003_R07C01  0.8900962  0.7422003  0.8193867  0.4695066   0.506374  0.2599947  0.6984969  0.9178001  0.1970387  0.8516975 0.46498877  0.5291975  0.9468370  0.8874706  0.9203317 0.79732117
##                     cg01130884 cg16020483 cg12925689 cg05813498 cg19248407 cg26474732 cg05130642 cg04845852 cg16098618 cg05138546 cg17811452 cg26081710 cg01097733 cg01608425 cg17329602 cg03640465
## 200223270003_R03C01  0.6230659  0.1673606 0.38196419  0.9039353  0.8313131  0.8184088  0.8644077  0.9212268  0.2571464  0.6230487 0.82740141  0.9198212  0.6753081  0.9264388  0.8189317  0.2531644
## 200223270003_R06C01  0.2847748  0.1209622 0.02873309  0.6252849  0.8525281  0.7358417  0.3661324  0.5118209  0.6899734  0.8963047 0.09338396  0.8801892  0.9131513  0.8887753  0.8478185  0.2904433
## 200223270003_R07C01  0.2313285  0.2499647 0.38592071  0.9086932  0.8467857  0.7509296  0.3039272  0.9034373  0.6488005  0.9057159 0.79817238  0.9153264  0.6832952  0.9065432  0.8596400  0.9024530
##                     cg17386240 cg16527629 cg12434901 cg26757229 cg02823329 cg16858433 cg04768387 cg03628603 cg05059349 cg04577745 cg00648024 cg23840008 cg15399577 cg08397053 cg04970287 cg24638099
## 200223270003_R03C01  0.7144809  0.4365003  0.8458468  0.1422661  0.6464005  0.9194211  0.9465814  0.9157246 0.04507417  0.2681033 0.40202875 0.66547425  0.8785443 0.04199567  0.8875750  0.4262170
## 200223270003_R06C01  0.8074824  0.0708336  0.8299579  0.7933794  0.9633930  0.9271632  0.9098563  0.8851075 0.03898752  0.8570624 0.05579011 0.88483246  0.8703169 0.04437741  0.4651667  0.8787392
## 200223270003_R07C01  0.7227918  0.4492586  0.8482994  0.8074830  0.6617541  0.9288986  0.9413240  0.8923890 0.85329923  0.9002276 0.03708944 0.09020907  0.8968856 0.59796746  0.9092326  0.8682765
##                     cg10666341 cg05125667 cg14170504 cg05321907 cg20070588 cg20678988 cg10844498 cg12466610 cg15535896 cg04073914 cg11826549 cg26052728 cg06032337 cg10829391 cg27639199 cg06002867
## 200223270003_R03C01  0.6731062 0.54151552 0.02236650  0.1782629  0.5057088  0.8548886  0.1391318 0.59131778  0.9253926 0.03089677 0.04794983  0.1513937  0.5657198  0.5929616 0.67552763 0.84888752
## 200223270003_R06C01  0.6443180 0.49090787 0.02988245  0.8427929  0.8654344  0.7786685  0.1385549 0.06939623  0.3320191 0.89962516 0.03672380  0.5254754  0.5653758  0.9411947 0.06233093 0.02698175
## 200223270003_R07C01  0.8970292 0.01590936 0.48543531  0.8320504  0.8425849  0.8260541  0.7374725 0.04527733  0.9409104 0.47195215 0.51173417  0.5600724  0.5229594  0.9322956 0.05701332 0.48042117
##                     cg16431720  age.now cg20704148 cg18861767 cg17002338 cg20094343 cg11266396 cg12293347 cg25649515 cg22251955 cg15501526
## 200223270003_R03C01  0.8692449 78.60000 0.02409027  0.7847380  0.2684163  0.7128750 0.01905761  0.9253031 0.92357530 0.02254441  0.6319253
## 200223270003_R06C01  0.8773137 80.40000 0.02580923  0.4734572  0.2811103  0.3291595 0.53122014  0.9176094 0.58958387 0.02714054  0.7435100
## 200223270003_R07C01  0.8988328 78.16441 0.47357786  0.7312175  0.2706349  0.4013815 0.02421064  0.6028463 0.02958575 0.89577950  0.7756577
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Mean)
## [1] 315 251
print(selected_impAvg_ordered_NAME)
##   [1] "PC1"        "cg24861747" "cg06864789" "cg01013522" "cg04124201" "cg15775217" "cg27114706" "cg23698271" "cg25174111" "cg02356645" "cg14780448" "cg26739327" "cg23836570" "cg02078724" "cg18037388"
##  [16] "cg20507276" "cg12080266" "cg00999469" "cg07152869" "cg18339359" "cg16390578" "cg04242342" "cg01680303" "cg20218135" "cg25561557" "cg03172493" "PC2"        "cg19503462" "cg13885788" "cg18662228"
##  [31] "cg22274273" "cg06697310" "cg05096415" "cg07584620" "cg03084184" "cg11314779" "cg12279734" "cg06378561" "cg12012426" "cg17419220" "cg09650803" "cg10058204" "cg04109990" "cg12776173" "cg05841700"
##  [46] "cg11358878" "cg10542624" "cg06870118" "cg14252149" "cg10701746" "cg11787167" "cg23916408" "cg05749243" "cg22901347" "cg03982462" "cg12858518" "cg17044529" "cg09216282" "cg24859648" "cg02217425"
##  [61] "cg24065597" "cg17118775" "cg15591384" "cg17296678" "cg26901661" "cg27341708" "cg26983017" "cg12306781" "cg09584650" "cg24851651" "cg04867412" "PC3"        "cg04771146" "cg23350716" "cg05373298"
##  [76] "cg02901522" "cg02095601" "cg02872767" "cg19555075" "cg00421199" "cg00322003" "cg11716267" "cg18526121" "cg03392100" "cg22681945" "cg11834635" "cg12074150" "cg13226272" "cg26948066" "cg07456472"
##  [91] "cg22653957" "cg02389264" "cg12471283" "cg07138269" "cg02656016" "cg16268937" "cg10507965" "cg16715186" "cg02627240" "cg24104387" "cg12333628" "cg12689021" "cg03167407" "cg17623720" "cg25758034"
## [106] "cg18821122" "cg03115532" "cg09247979" "cg08584917" "cg13080267" "cg04218584" "cg27452255" "cg01280698" "cg08242313" "cg26007606" "cg04831745" "cg16089727" "cg12240569" "cg14924512" "cg03327352"
## [121] "cg03187614" "cg06012621" "cg11109139" "cg02932958" "cg04467639" "cg21575308" "cg04664583" "cg02495179" "cg14764203" "cg17906851" "cg19512141" "cg00156497" "cg16361249" "cg00977253" "cg21757617"
## [136] "cg15700429" "cg07951602" "cg16338321" "cg05377703" "cg11227702" "cg05161773" "cg02489327" "cg23432430" "cg14181112" "cg27187580" "cg12108278" "cg21533482" "cg02981548" "cg11173002" "cg10786572"
## [151] "cg20913114" "cg02302183" "cg00332268" "cg03359067" "cg03088219" "cg26889118" "cg00051154" "cg16536985" "cg06264882" "cg21986118" "cg04798314" "cg23813394" "cg16310958" "cg15730644" "cg05351360"
## [166] "cg11835797" "cg00841008" "cg12284872" "cg14465143" "cg00675157" "cg17348244" "cg07304760" "cg06624143" "cg26089705" "cg12702014" "cg04033559" "cg21501207" "cg14904299" "cg03057303" "cg12213037"
## [181] "cg22071943" "cg17429539" "cg21578644" "cg24422984" "cg13799572" "cg12556569" "cg12421087" "cg27286614" "cg07971231" "cg16733676" "cg27224751" "cg01130884" "cg16020483" "cg12925689" "cg05813498"
## [196] "cg19248407" "cg26474732" "cg05130642" "cg04845852" "cg16098618" "cg05138546" "cg17811452" "cg26081710" "cg01097733" "cg01608425" "cg17329602" "cg03640465" "cg17386240" "cg16527629" "cg12434901"
## [211] "cg26757229" "cg02823329" "cg16858433" "cg04768387" "cg03628603" "cg05059349" "cg04577745" "cg00648024" "cg23840008" "cg15399577" "cg08397053" "cg04970287" "cg24638099" "cg10666341" "cg05125667"
## [226] "cg14170504" "cg05321907" "cg20070588" "cg20678988" "cg10844498" "cg12466610" "cg15535896" "cg04073914" "cg11826549" "cg26052728" "cg06032337" "cg10829391" "cg27639199" "cg06002867" "cg16431720"
## [241] "age.now"    "cg20704148" "cg18861767" "cg17002338" "cg20094343" "cg11266396" "cg12293347" "cg25649515" "cg22251955" "cg15501526"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 251
##   DX            PC1 cg24861747 cg06864789 cg01013522 cg04124201 cg15775217 cg27114706 cg23698271 cg25174111 cg02356645 cg14780448 cg26739327 cg23836570 cg02078724 cg18037388 cg20507276 cg12080266
##   <fct>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN       -0.173        0.431     0.461       0.886      0.331      0.917     0.936       0.911      0.857      0.583     0.670      0.769      0.543       0.290      0.755     0.387       0.945
## 2 CN       -0.00367      0.807     0.875       0.543      0.324      0.604     0.929       0.905      0.257      0.570     0.621      0.873      0.0327      0.281      0.729     0.480       0.936
## 3 Dementia -0.187        0.335     0.490       0.843      0.433      0.906     0.479       0.880      0.190      0.568     0.0443     0.834      0.599       0.274      0.239     0.0226      0.640
## 4 CN       -0.0379       0.600     0.0542      0.824      0.307      0.638     0.930       0.528      0.205      0.919     0.913      0.105      0.573       0.258      0.263     0.0357      0.575
## 5 Dementia -0.139        0.773     0.835       0.512      0.375      0.570     0.0419      0.910      0.866      0.907     0.911      0.757      0.918       0.273      0.845     0.235       0.539
## 6 CN       -0.213        0.731     0.374       0.492      0.373      0.886     0.947       0.903      0.203      0.895     0.655      0.0827     0.502       0.285      0.722     0.526       0.554
## # ℹ 233 more variables: cg00999469 <dbl>, cg07152869 <dbl>, cg18339359 <dbl>, cg16390578 <dbl>, cg04242342 <dbl>, cg01680303 <dbl>, cg20218135 <dbl>, cg25561557 <dbl>, cg03172493 <dbl>, PC2 <dbl>,
## #   cg19503462 <dbl>, cg13885788 <dbl>, cg18662228 <dbl>, cg22274273 <dbl>, cg06697310 <dbl>, cg05096415 <dbl>, cg07584620 <dbl>, cg03084184 <dbl>, cg11314779 <dbl>, cg12279734 <dbl>,
## #   cg06378561 <dbl>, cg12012426 <dbl>, cg17419220 <dbl>, cg09650803 <dbl>, cg10058204 <dbl>, cg04109990 <dbl>, cg12776173 <dbl>, cg05841700 <dbl>, cg11358878 <dbl>, cg10542624 <dbl>,
## #   cg06870118 <dbl>, cg14252149 <dbl>, cg10701746 <dbl>, cg11787167 <dbl>, cg23916408 <dbl>, cg05749243 <dbl>, cg22901347 <dbl>, cg03982462 <dbl>, cg12858518 <dbl>, cg17044529 <dbl>,
## #   cg09216282 <dbl>, cg24859648 <dbl>, cg02217425 <dbl>, cg24065597 <dbl>, cg17118775 <dbl>, cg15591384 <dbl>, cg17296678 <dbl>, cg26901661 <dbl>, cg27341708 <dbl>, cg26983017 <dbl>,
## #   cg12306781 <dbl>, cg09584650 <dbl>, cg24851651 <dbl>, cg04867412 <dbl>, PC3 <dbl>, cg04771146 <dbl>, cg23350716 <dbl>, cg05373298 <dbl>, cg02901522 <dbl>, cg02095601 <dbl>, cg02872767 <dbl>,
## #   cg19555075 <dbl>, cg00421199 <dbl>, cg00322003 <dbl>, cg11716267 <dbl>, cg18526121 <dbl>, cg03392100 <dbl>, cg22681945 <dbl>, cg11834635 <dbl>, cg12074150 <dbl>, cg13226272 <dbl>, …
dim(output_mean_process)
## [1] 315 251

Based on Median

Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
##        Feature        0%        25%       50%       75%      100%
## 256 cg24861747 0.1692723 0.29566395 0.8000000 0.9163772 0.9680001
## 280        PC1 0.1064562 0.60447256 0.8000000 1.0000000 1.0000000
## 13  cg01013522 0.2492151 0.35428316 0.6198419 0.8000000 0.9395130
## 55  cg04242342 0.1071483 0.18656325 0.6000000 0.6130838 0.6781069
## 89  cg06864789 0.2398837 0.33511358 0.6000000 0.9055556 0.9470049
## 246 cg23836570 0.0000000 0.09745053 0.6000000 0.6486526 1.0000000
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "cg24861747" "PC1"        "cg01013522" "cg04242342" "cg06864789" "cg23836570"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
  
print(head(df_selected_Median))
##                           DX cg24861747          PC1 cg01013522 cg04242342 cg06864789 cg23836570 cg25174111 cg02356645 cg04124201 cg00999469         PC2 cg23698271 cg14780448 cg15775217 cg26739327
## 200223270003_R03C01       CN  0.4309505 -0.172761185  0.8862821  0.8167892  0.4605312 0.54259383  0.8573844  0.5833923  0.3308589  0.2857719  0.05745834  0.9109565 0.67021018  0.9168327  0.7693268
## 200223270003_R06C01       CN  0.8071462 -0.003667305  0.5425308  0.8040357  0.8751365 0.03267304  0.2567745  0.5701428  0.3241613  0.2499229  0.08372861  0.9051701 0.62073547  0.6042521  0.8727608
## 200223270003_R07C01 Dementia  0.3347317 -0.186779607  0.8429862  0.8286115  0.4902033 0.59939745  0.1903803  0.5683381  0.4332693  0.2819622 -0.01117250  0.8804362 0.04425741  0.9062231  0.8340445
##                     cg12279734 cg22274273 cg16390578 cg09650803 cg18339359 cg07152869 cg03172493 cg06697310 cg13885788 cg18037388 cg20507276  cg22901347 cg25561557 cg27114706          PC3 cg19503462
## 200223270003_R03C01  0.1494651  0.4246379 0.20983422  0.8954464  0.9040272   0.505063 0.63362492  0.8653044  0.9369476  0.7545086 0.38721972 0.001690332 0.03851635  0.9359259  0.005055871  0.4537684
## 200223270003_R06C01  0.8760759  0.4196796 0.06389068  0.9113477  0.8552121   0.835249 0.06148804  0.2405168  0.5163017  0.7294565 0.47978438 0.103413834 0.47259480  0.9285384  0.029143653  0.6997359
## 200223270003_R07C01  0.8674214  0.4164100 0.23101450  0.2518414  0.3073106   0.519430 0.64562298  0.8479193  0.9183376  0.2391659 0.02261996 0.632991482 0.43364249  0.4787397 -0.032302430  0.7189778
##                     cg26983017 cg09216282 cg10542624 cg11787167 cg02095601 cg02078724 cg02872767 cg18662228 cg23916408 cg04109990 cg20218135 cg12858518 cg03982462 cg01680303 cg03084184 cg06870118
## 200223270003_R03C01 0.03145466  0.9244259 0.02189577 0.04673831  0.9161259  0.2896133  0.3886537  0.8730153  0.9154993  0.6476604 0.64278153  0.9285252  0.6023731  0.1344941  0.7877128  0.8100144
## 200223270003_R06C01 0.84677625  0.9263996 0.54330620 0.32564508  0.2233062  0.2805612  0.9099575  0.8602464  0.8886255  0.6692040 0.06509247  0.9017533  0.8778458  0.7573869  0.4546397  0.7802055
## 200223270003_R07C01 0.53922255  0.9352308 0.54991492 0.43162543  0.8978191  0.2739571  0.8603283  0.8683578  0.8872447  0.9024920 0.65642359  0.9187879  0.8860227  0.4772204  0.7812413  0.7917257
##                     cg12306781 cg00322003 cg12471283 cg12080266 cg06378561 cg14252149 cg27452255 cg11358878 cg00421199 cg05749243 cg17118775 cg24859648 cg13799572 cg00977253 cg15591384 cg02932958
## 200223270003_R03C01  0.8663817  0.5702070  0.8658731  0.9450629  0.9377503 0.02450779  0.6593379 0.83252951  0.8532461  0.9209685  0.5585676 0.44392797  0.8449584  0.9145988  0.7870275  0.4210489
## 200223270003_R06C01  0.8027798  0.3077122  0.6963410  0.9363381  0.5154019 0.02382413  0.9012217 0.87521203  0.8891803  0.9143061  0.2916054 0.03341185  0.4409219  0.8944518  0.7429614  0.3825995
## 200223270003_R07C01  0.8787250  0.6104341  0.6680611  0.6398247  0.9403569 0.56212480  0.8898635 0.08917903  0.8937751  0.9121180  0.2868948 0.43582347  0.8516975  0.9150206  0.8346279  0.7617081
##                     cg12108278 cg07584620 cg00841008 cg23432430 cg03392100 cg16715186 cg05096415 cg08584917 cg08242313 cg02389264 cg09584650 cg16268937 cg26948066 cg26757229 cg04218584 cg05373298
## 200223270003_R03C01  0.9243869  0.3763980 0.61899333  0.9455418  0.9227394  0.7946153  0.5177819  0.9019732  0.8953645  0.7900942 0.09661586  0.8931712  0.5026045  0.1422661  0.8971263 0.02652391
## 200223270003_R06C01  0.9068995  0.8530961 0.05401588  0.9418716  0.8902340  0.8124316  0.6288426  0.9187789  0.8573493  0.7789974 0.52399749  0.9034556  0.9101976  0.7933794  0.8491768 0.83538124
## 200223270003_R07C01  0.9131367  0.3888623 0.90769205  0.9426559  0.4359657  0.7773263  0.6060271  0.6007449  0.8992114  0.4174463 0.11587211  0.8928450  0.9379543  0.8074830  0.9008137 0.89506024
##                     cg05841700 cg26474732 cg27286614 cg10701746 cg26901661 cg06624143 cg18821122 cg17044529 cg11173002 cg15399577 cg16338321 cg20913114 cg03115532 cg04831745 cg19555075 cg02901522
## 200223270003_R03C01  0.9146488  0.8184088  0.5933858  0.4868342  0.8754981  0.4899758  0.5901603  0.9117895  0.5913599  0.8785443  0.8294062 0.80382984  0.8659608 0.71214149  0.4921409  0.9372901
## 200223270003_R06C01  0.3737990  0.7358417  0.6348795  0.4927257  0.9021064  0.9107688  0.5779620  0.9290636  0.1878736  0.8703169  0.4918708 0.03158439  0.8533871 0.06871768  0.4261618  0.4954978
## 200223270003_R07C01  0.5046468  0.7509296  0.9468370  0.8552180  0.8556831  0.9217350  0.9251431  0.9402858  0.5150840  0.8968856  0.5245645 0.81256840  0.4416574 0.90994644  0.4694729  0.9381188
##                     cg24104387 cg21501207 cg12702014 cg01280698 cg15730644 cg02217425 cg14924512 cg04798314 cg11314779 cg00675157 cg11247378 cg12556569 cg23161429 cg05059349 cg02494911 cg24065597
## 200223270003_R03C01  0.5339034  0.6813712  0.7848681 0.88462009  0.4353906  0.1032503  0.9160885 0.07119798  0.8966100  0.9242325  0.7874849 0.03924599  0.9099619 0.04507417  0.2416332  0.2221098
## 200223270003_R06C01  0.3007614  0.4747229  0.8065993 0.88471320  0.8763048  0.6592850  0.9088414 0.09248843  0.8908661  0.9254708  0.4807942 0.48636893  0.8833895 0.03898752  0.2520909  0.7036129
## 200223270003_R07C01  0.7509780  0.7422003  0.7458594 0.06370005  0.4833709  0.8792021  0.9081681 0.06972566  0.9048316  0.5447244  0.4537348 0.46498877  0.9134709 0.85329923  0.2457032  0.2407676
##                     cg14904299 cg19512141 cg21533482 cg16098618 cg16858433 cg17623720 cg23350716 cg12240569 cg13226272 cg06536614 cg04467639 cg26007606 cg06264882 cg10666341 cg03640465 cg04970287
## 200223270003_R03C01  0.2712472  0.7903543  0.8288469  0.2571464  0.9194211  0.8988624  0.7876873 0.02690547  0.5410002  0.5746694  0.6400206  0.5615550 0.43678655  0.6731062  0.2531644  0.8875750
## 200223270003_R06C01  0.8364544  0.8404684  0.6766373  0.6899734  0.9271632  0.8172384  0.6960544 0.46030640  0.4437070  0.5773468  0.5657041  0.1463111 0.43703442  0.6443180  0.2904433  0.4651667
## 200223270003_R07C01  0.8193867  0.2202759  0.6235932  0.6488005  0.9288986  0.8226085  0.7387498 0.86185839  0.0265215  0.5848917  0.6302917  0.8101800 0.02439581  0.8970292  0.9024530  0.9092326
##                     cg11706829 cg21578644 cg17386240 cg21986118 cg02302183 cg05321907 cg14764203 cg15700429 cg13080267 cg11331837 cg11834635 cg17419220 cg10058204 cg24851651 cg07971231 cg10507965
## 200223270003_R03C01  0.5444785  0.9260863  0.7144809  0.6571296  0.9191148  0.1782629  0.4683709  0.9114530 0.78371483 0.57150125  0.8880887 0.43470227  0.5834496 0.05358297  0.8406145  0.4010973
## 200223270003_R06C01  0.5669449  0.9159726  0.8074824  0.7034445  0.8749250  0.8427929  0.8916566  0.8838233 0.09436069 0.03182862  0.2493491 0.02781411  0.0549494 0.05968923  0.8447914  0.4033691
## 200223270003_R07C01  0.8746449  0.9178001  0.7227918  0.9055894  0.8888247  0.8320504  0.8714472  0.9095363 0.09351259 0.03832164  0.2210428 0.42803809  0.5689591 0.60864179  0.8874706  0.3869543
##                     cg26889118 cg22071943 cg18526121 cg07304760 cg00648024 cg17329602 cg22653957 cg16361249 cg05455372 cg02495179 cg05377703 cg02656016 cg11227702 cg27187580 cg10786572 cg06875704
## 200223270003_R03C01  0.9154836  0.2442648  0.4762313  0.5798534 0.40202875  0.8189317  0.6442184 0.52843073  0.5532370  0.7373055  0.8213047  0.2355680 0.49184121  0.6643576  0.5982086  0.9181165
## 200223270003_R06C01  0.9101336  0.2644581  0.4833367  0.5575516 0.05579011  0.8478185  0.9531308 0.09039669  0.6375708  0.5588114  0.5152514  0.9052318 0.02543724  0.6914924  0.0935115  0.9200461
## 200223270003_R07C01  0.5759967  0.2599947  0.7761450  0.9195617 0.03708944  0.8596400  0.6534542 0.42039062  0.8095964  0.5273309  0.7773036  0.8653682 0.45150971  0.9357074  0.8436837  0.9048289
##                     cg02981548 cg04577745 cg12434901 cg12421087 cg11835797 cg27224751 cg02627240 cg11109139 cg07456472 cg09247979 cg07138269 cg01802772 cg09518270 cg17429539 cg12776173 cg26052728
## 200223270003_R03C01  0.5220037  0.2681033  0.8458468  0.5399655  0.9007408 0.03214912 0.57129408  0.6350109  0.5856904  0.5706177  0.9426707 0.02361869  0.8870663  0.7100923  0.8730635  0.1513937
## 200223270003_R06C01  0.5098965  0.8570624  0.8299579  0.5400348  0.8944957 0.83123722 0.05309659  0.6904482  0.3886482  0.5090215  0.5057781 0.02401520  0.8765622  0.7660838  0.7009491  0.5254754
## 200223270003_R07C01  0.5660985  0.9002276  0.8482994  0.5291975  0.8168544 0.79732117 0.52179136  0.6274025  0.9186405  0.5066661  0.9400527 0.02200957  0.8135001  0.6984969  0.1136716  0.5600724
##                     cg03628603 cg15501526 cg14465143 cg01130884 cg08397053 cg11716267 cg12074150 cg00051154 cg18861767 cg25758034 cg21575308 cg03327352 cg03057303 cg04073914 cg04664583 cg00156497
## 200223270003_R03C01  0.9157246  0.6319253  0.5543068  0.6230659 0.04199567 0.04959702 0.18602738 0.08370609  0.7847380  0.6649219 0.44702405  0.8786878  0.8923039 0.03089677  0.5881190  0.5194653
## 200223270003_R06C01  0.8851075  0.7435100  0.2702875  0.2847748 0.04437741 0.49143010 0.14231506 0.61288950  0.4734572  0.2393844 0.44792570  0.3042310  0.4954311 0.89962516  0.9352717  0.9024063
## 200223270003_R07C01  0.8923890  0.7756577  0.2621492  0.2313285 0.59796746 0.45857830 0.09201303 0.07638127  0.7312175  0.7071501 0.02822675  0.8273211  0.4695066 0.47195215  0.9350230  0.9067989
##                     cg17002338 cg04845852 cg12738248 cg12466610 cg14609402 cg01097733 cg12012426 cg04033559 cg17811452 cg16310958 cg20300784 cg02489327 cg23813394 cg00332268 cg06012621 cg23840008
## 200223270003_R03C01  0.2684163  0.9212268 0.88010292 0.59131778  0.9087631  0.6753081  0.9434768  0.8768243 0.82740141  0.9300073 0.86609999  0.8616312 0.48811365  0.9044887  0.8579519 0.66547425
## 200223270003_R06C01  0.2811103  0.5118209 0.51121855 0.06939623  0.9109735  0.9131513  0.9220044  0.8257388 0.09338396  0.9228871 0.03091187  0.8777949 0.02943436  0.5777209  0.5325037 0.88483246
## 200223270003_R07C01  0.2706349  0.9034373 0.09131476 0.04527733  0.9099145  0.6832952  0.9241284  0.8900962 0.79817238  0.8539019 0.90319796  0.4205073 0.92935625  0.5848006  0.6263080 0.09020907
##                     cg27341708 cg20094343 cg27577781 cg22681945 cg03167407 cg16089727 cg02823329 cg23947654 cg04768387 cg10844498 cg03359067 cg14170504 cg17906851 cg12333628 cg12284872 cg05351360
## 200223270003_R03C01 0.02613847  0.7128750  0.8113185  0.8388195  0.7610292 0.54996692  0.6464005  0.8079296  0.9465814  0.1391318  0.8628564 0.02236650  0.9529718  0.9092861  0.7414569 0.03855181
## 200223270003_R06C01 0.86893582  0.3291595  0.8144274  0.8700500  0.3087606 0.05876736  0.9633930  0.8017579  0.9098563  0.1385549  0.8144536 0.02988245  0.6462151  0.5084647  0.7725267 0.76395533
## 200223270003_R07C01 0.02642300  0.4013815  0.7970617  0.3344105  0.2455453 0.85485461  0.6617541  0.7584946  0.9413240  0.7374725  0.8737908 0.48543531  0.9553497  0.5229394  0.7573369 0.77000888
##                     cg19248407 cg15535896 cg24422984 cg18310072 cg27639199 cg26081710 cg06032337 cg04771146 cg24638099 cg18029737 cg09993718 cg04867412 cg12689021 cg20070588 cg16020483 cg14181112
## 200223270003_R03C01  0.8313131  0.9253926  0.5462594  0.1449858 0.67552763  0.9198212  0.5657198  0.7648566  0.4262170  0.9016634  0.7227856  0.8796800  0.7449475  0.5057088  0.1673606  0.1615405
## 200223270003_R06C01  0.8525281  0.3320191  0.5193121  0.9321264 0.06233093  0.8801892  0.5653758  0.3125007  0.8787392  0.7376586  0.4378752  0.4497115  0.7872237  0.8654344  0.1209622  0.3424621
## 200223270003_R07C01  0.8467857  0.9409104  0.1970387  0.9108063 0.05701332  0.9153264  0.5229594  0.2909958  0.8682765  0.9397667  0.7067889  0.4445373  0.7523141  0.8425849  0.2499647  0.2178314
##                     cg01608425 cg10829391 cg13375589 cg05161773 cg21757617 cg05125667 cg10985055 cg17348244 cg12293347 cg16733676 cg05813498
## 200223270003_R03C01  0.9264388  0.5929616  0.4578240  0.4154907  0.4429909 0.54151552  0.8631895 0.81793075  0.9253031  0.8904541  0.9039353
## 200223270003_R06C01  0.8887753  0.9411947  0.6025638  0.8526849  0.4472538 0.49090787  0.5456633 0.07241099  0.9176094  0.1698111  0.6252849
## 200223270003_R07C01  0.9065432  0.9322956  0.8182629  0.4259275  0.4339315 0.01590936  0.8825100 0.78025001  0.6028463  0.9203317  0.9086932
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Median)
## [1] 315 251
print(Selected_median_imp_Name)
##   [1] "cg24861747" "PC1"        "cg01013522" "cg04242342" "cg06864789" "cg23836570" "cg25174111" "cg02356645" "cg04124201" "cg00999469" "PC2"        "cg23698271" "cg14780448" "cg15775217" "cg26739327"
##  [16] "cg12279734" "cg22274273" "cg16390578" "cg09650803" "cg18339359" "cg07152869" "cg03172493" "cg06697310" "cg13885788" "cg18037388" "cg20507276" "cg22901347" "cg25561557" "cg27114706" "PC3"       
##  [31] "cg19503462" "cg26983017" "cg09216282" "cg10542624" "cg11787167" "cg02095601" "cg02078724" "cg02872767" "cg18662228" "cg23916408" "cg04109990" "cg20218135" "cg12858518" "cg03982462" "cg01680303"
##  [46] "cg03084184" "cg06870118" "cg12306781" "cg00322003" "cg12471283" "cg12080266" "cg06378561" "cg14252149" "cg27452255" "cg11358878" "cg00421199" "cg05749243" "cg17118775" "cg24859648" "cg13799572"
##  [61] "cg00977253" "cg15591384" "cg02932958" "cg12108278" "cg07584620" "cg00841008" "cg23432430" "cg03392100" "cg16715186" "cg05096415" "cg08584917" "cg08242313" "cg02389264" "cg09584650" "cg16268937"
##  [76] "cg26948066" "cg26757229" "cg04218584" "cg05373298" "cg05841700" "cg26474732" "cg27286614" "cg10701746" "cg26901661" "cg06624143" "cg18821122" "cg17044529" "cg11173002" "cg15399577" "cg16338321"
##  [91] "cg20913114" "cg03115532" "cg04831745" "cg19555075" "cg02901522" "cg24104387" "cg21501207" "cg12702014" "cg01280698" "cg15730644" "cg02217425" "cg14924512" "cg04798314" "cg11314779" "cg00675157"
## [106] "cg11247378" "cg12556569" "cg23161429" "cg05059349" "cg02494911" "cg24065597" "cg14904299" "cg19512141" "cg21533482" "cg16098618" "cg16858433" "cg17623720" "cg23350716" "cg12240569" "cg13226272"
## [121] "cg06536614" "cg04467639" "cg26007606" "cg06264882" "cg10666341" "cg03640465" "cg04970287" "cg11706829" "cg21578644" "cg17386240" "cg21986118" "cg02302183" "cg05321907" "cg14764203" "cg15700429"
## [136] "cg13080267" "cg11331837" "cg11834635" "cg17419220" "cg10058204" "cg24851651" "cg07971231" "cg10507965" "cg26889118" "cg22071943" "cg18526121" "cg07304760" "cg00648024" "cg17329602" "cg22653957"
## [151] "cg16361249" "cg05455372" "cg02495179" "cg05377703" "cg02656016" "cg11227702" "cg27187580" "cg10786572" "cg06875704" "cg02981548" "cg04577745" "cg12434901" "cg12421087" "cg11835797" "cg27224751"
## [166] "cg02627240" "cg11109139" "cg07456472" "cg09247979" "cg07138269" "cg01802772" "cg09518270" "cg17429539" "cg12776173" "cg26052728" "cg03628603" "cg15501526" "cg14465143" "cg01130884" "cg08397053"
## [181] "cg11716267" "cg12074150" "cg00051154" "cg18861767" "cg25758034" "cg21575308" "cg03327352" "cg03057303" "cg04073914" "cg04664583" "cg00156497" "cg17002338" "cg04845852" "cg12738248" "cg12466610"
## [196] "cg14609402" "cg01097733" "cg12012426" "cg04033559" "cg17811452" "cg16310958" "cg20300784" "cg02489327" "cg23813394" "cg00332268" "cg06012621" "cg23840008" "cg27341708" "cg20094343" "cg27577781"
## [211] "cg22681945" "cg03167407" "cg16089727" "cg02823329" "cg23947654" "cg04768387" "cg10844498" "cg03359067" "cg14170504" "cg17906851" "cg12333628" "cg12284872" "cg05351360" "cg19248407" "cg15535896"
## [226] "cg24422984" "cg18310072" "cg27639199" "cg26081710" "cg06032337" "cg04771146" "cg24638099" "cg18029737" "cg09993718" "cg04867412" "cg12689021" "cg20070588" "cg16020483" "cg14181112" "cg01608425"
## [241] "cg10829391" "cg13375589" "cg05161773" "cg21757617" "cg05125667" "cg10985055" "cg17348244" "cg12293347" "cg16733676" "cg05813498"
print(head(output_median_feature))
## # A tibble: 6 × 251
##   DX    cg24861747      PC1 cg01013522 cg04242342 cg06864789 cg23836570 cg25174111 cg02356645 cg04124201 cg00999469     PC2 cg23698271 cg14780448 cg15775217 cg26739327 cg12279734 cg22274273 cg16390578
##   <fct>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN         0.431 -0.173        0.886     0.817      0.461      0.543       0.857      0.583      0.331      0.286  0.0575      0.911     0.670       0.917     0.769       0.149     0.425      0.210 
## 2 CN         0.807 -0.00367      0.543     0.804      0.875      0.0327      0.257      0.570      0.324      0.250  0.0837      0.905     0.621       0.604     0.873       0.876     0.420      0.0639
## 3 Deme…      0.335 -0.187        0.843     0.829      0.490      0.599       0.190      0.568      0.433      0.282 -0.0112      0.880     0.0443      0.906     0.834       0.867     0.416      0.231 
## 4 CN         0.600 -0.0379       0.824     0.443      0.0542     0.573       0.205      0.919      0.307      0.297  0.0157      0.528     0.913       0.638     0.105       0.866     0.0230     0.213 
## 5 Deme…      0.773 -0.139        0.512     0.419      0.835      0.918       0.866      0.907      0.375      0.290  0.0299      0.910     0.911       0.570     0.757       0.610     0.413      0.245 
## 6 CN         0.731 -0.213        0.492     0.0282     0.374      0.502       0.203      0.895      0.373      0.927  0.0518      0.903     0.655       0.886     0.0827      0.600     0.0300     0.871 
## # ℹ 232 more variables: cg09650803 <dbl>, cg18339359 <dbl>, cg07152869 <dbl>, cg03172493 <dbl>, cg06697310 <dbl>, cg13885788 <dbl>, cg18037388 <dbl>, cg20507276 <dbl>, cg22901347 <dbl>,
## #   cg25561557 <dbl>, cg27114706 <dbl>, PC3 <dbl>, cg19503462 <dbl>, cg26983017 <dbl>, cg09216282 <dbl>, cg10542624 <dbl>, cg11787167 <dbl>, cg02095601 <dbl>, cg02078724 <dbl>, cg02872767 <dbl>,
## #   cg18662228 <dbl>, cg23916408 <dbl>, cg04109990 <dbl>, cg20218135 <dbl>, cg12858518 <dbl>, cg03982462 <dbl>, cg01680303 <dbl>, cg03084184 <dbl>, cg06870118 <dbl>, cg12306781 <dbl>,
## #   cg00322003 <dbl>, cg12471283 <dbl>, cg12080266 <dbl>, cg06378561 <dbl>, cg14252149 <dbl>, cg27452255 <dbl>, cg11358878 <dbl>, cg00421199 <dbl>, cg05749243 <dbl>, cg17118775 <dbl>,
## #   cg24859648 <dbl>, cg13799572 <dbl>, cg00977253 <dbl>, cg15591384 <dbl>, cg02932958 <dbl>, cg12108278 <dbl>, cg07584620 <dbl>, cg00841008 <dbl>, cg23432430 <dbl>, cg03392100 <dbl>,
## #   cg16715186 <dbl>, cg05096415 <dbl>, cg08584917 <dbl>, cg08242313 <dbl>, cg02389264 <dbl>, cg09584650 <dbl>, cg16268937 <dbl>, cg26948066 <dbl>, cg26757229 <dbl>, cg04218584 <dbl>,
## #   cg05373298 <dbl>, cg05841700 <dbl>, cg26474732 <dbl>, cg27286614 <dbl>, cg10701746 <dbl>, cg26901661 <dbl>, cg06624143 <dbl>, cg18821122 <dbl>, cg17044529 <dbl>, cg11173002 <dbl>, …

Based on Frequency

Function for Frequency Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame, 
# And process with the steps we discussed before.
# The output will be the feature frequency Table. 
#  (i.e. frequency of the appearance of each features based on the Top Number of features selected)
  
  
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature


# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)

feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}

Now, the function will be tested below:

df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## cg01013522   1   1   1  0   1           4
## cg06864789   1   1   1  1   0           4
## PC1          1   0   1  0   1           3
## cg01680303   1   0   1  0   1           3
## cg02356645   1   1   1  0   0           3
## cg26739327   1   1   1  0   0           3
## cg24861747   0   1   0  1   1           3
## cg15775217   0   1   1  0   1           3
## PC2          1   0   1  0   0           2
## cg02872767   1   0   1  0   0           2
## cg11787167   1   0   1  0   0           2
## cg09216282   1   0   1  0   0           2
## cg12080266   1   0   0  1   0           2
## cg19503462   1   0   1  0   0           2
## cg07152869   1   0   1  0   0           2
## cg12858518   1   0   1  0   0           2
## cg26757229   1   0   1  0   0           2
## cg25174111   0   1   0  1   0           2
## cg20507276   0   1   0  1   0           2
## cg04124201   0   0   1  1   0           2
## cg12012426   0   0   0  1   1           2
## cg17419220   0   0   0  1   1           2
## cg12776173   0   0   0  1   1           2
## cg06378561   1   0   0  0   0           1
## cg12108278   1   0   0  0   0           1
## cg03084184   1   0   0  0   0           1
## cg14780448   1   0   0  0   0           1
## cg02932958   1   0   0  0   0           1
## cg23836570   0   1   0  0   0           1
## cg23698271   0   1   0  0   0           1
## cg00999469   0   1   0  0   0           1
## cg16390578   0   1   0  0   0           1
## cg13885788   0   1   0  0   0           1
## cg27114706   0   1   0  0   0           1
## cg25561557   0   1   0  0   0           1
## cg18037388   0   1   0  0   0           1
## cg24859648   0   1   0  0   0           1
## cg03172493   0   1   0  0   0           1
## cg04242342   0   1   0  0   0           1
## cg06697310   0   1   0  0   0           1
## PC3          0   0   1  0   0           1
## cg04109990   0   0   1  0   0           1
## cg03982462   0   0   1  0   0           1
## cg06870118   0   0   1  0   0           1
## cg12333628   0   0   0  1   0           1
## cg17296678   0   0   0  1   0           1
## cg17118775   0   0   0  1   0           1
## cg18339359   0   0   0  1   0           1
## cg21575308   0   0   0  1   0           1
## cg03167407   0   0   0  1   0           1
## cg05841700   0   0   0  1   0           1
## cg06264882   0   0   0  1   0           1
## cg03115532   0   0   0  1   0           1
## cg07951602   0   0   0  1   0           1
## cg00051154   0   0   0  1   0           1
## cg02078724   0   0   0  0   1           1
## cg20218135   0   0   0  0   1           1
## cg18662228   0   0   0  0   1           1
## cg07584620   0   0   0  0   1           1
## cg11314779   0   0   0  0   1           1
## cg10058204   0   0   0  0   1           1
## cg11358878   0   0   0  0   1           1
## cg10701746   0   0   0  0   1           1
## cg27341708   0   0   0  0   1           1
## cg04867412   0   0   0  0   1           1
## cg04771146   0   0   0  0   1           1
## cg02901522   0   0   0  0   1           1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0

Selected data frame based on Frequency for Output

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
                                                         combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## PC1          1   1   1  1   1           5
## PC2          1   1   1  1   1           5
## cg11787167   1   1   1  1   1           5
## cg09216282   1   1   1  1   1           5
## cg01680303   1   1   1  1   1           5
## cg12080266   1   1   1  1   1           5
## cg19503462   1   1   1  1   1           5
## cg02356645   1   1   1  1   1           5
## cg06378561   1   1   1  1   1           5
## cg07152869   1   1   1  1   1           5
## cg03084184   1   1   1  1   1           5
## cg01013522   1   1   1  1   1           5
## cg26739327   1   1   1  1   1           5
## cg06864789   1   1   1  1   1           5
## cg14780448   1   1   1  1   1           5
## cg02932958   1   1   1  1   1           5
## cg12858518   1   1   1  1   1           5
## cg04124201   1   1   1  1   1           5
## cg03982462   1   1   1  1   1           5
## cg12306781   1   1   1  1   1           5
## cg23432430   1   1   1  1   1           5
## cg00322003   1   1   1  1   1           5
## cg04109990   1   1   1  1   1           5
## cg27114706   1   1   1  1   1           5
## cg15775217   1   1   1  1   1           5
## cg20218135   1   1   1  1   1           5
## cg03392100   1   1   1  1   1           5
## cg17044529   1   1   1  1   1           5
## cg27452255   1   1   1  1   1           5
## cg02078724   1   1   1  1   1           5
## cg05096415   1   1   1  1   1           5
## cg20507276   1   1   1  1   1           5
## cg25561557   1   1   1  1   1           5
## cg17623720   1   1   1  1   1           5
## cg17118775   1   1   1  1   1           5
## cg12471283   1   1   1  1   1           5
## cg00421199   1   1   1  1   1           5
## cg02217425   1   1   1  1   1           5
## cg16338321   1   1   1  1   1           5
## cg20913114   1   1   1  1   1           5
## cg14764203   1   1   1  1   1           5
## cg15730644   1   1   1  1   1           5
## cg16715186   1   1   1  1   1           5
## cg24861747   1   1   1  1   1           5
## cg09584650   1   1   1  1   1           5
## cg09650803   1   1   1  1   1           5
## cg23698271   1   1   1  1   1           5
## cg12702014   1   1   1  1   1           5
## cg22901347   1   1   1  1   1           5
## cg07584620   1   1   1  1   1           5
## cg13799572   1   1   1  1   1           5
## cg18339359   1   1   1  1   1           5
## cg22274273   1   1   1  1   1           5
## cg10701746   1   1   1  1   1           5
## cg04798314   1   1   1  1   1           5
## cg01280698   1   1   1  1   1           5
## cg05749243   1   1   1  1   1           5
## cg26474732   1   1   1  1   1           5
## cg06870118   1   1   1  1   1           5
## cg15700429   1   1   1  1   1           5
## cg24065597   1   1   1  1   1           5
## cg05841700   1   1   1  1   1           5
## cg03640465   1   1   1  1   1           5
## cg18526121   1   1   1  1   1           5
## cg21575308   1   1   1  1   1           5
## cg11716267   1   1   1  1   1           5
## cg15591384   1   1   1  1   1           5
## cg10786572   1   1   1  1   1           5
## cg21578644   1   1   1  1   1           5
## cg07138269   1   1   1  1   1           5
## cg24104387   1   1   1  1   1           5
## cg12240569   1   1   1  1   1           5
## cg04218584   1   1   1  1   1           5
## cg21501207   1   1   1  1   1           5
## cg03172493   1   1   1  1   1           5
## cg11835797   1   1   1  1   1           5
## cg00841008   1   1   1  1   1           5
## cg18662228   1   1   1  1   1           5
## cg02302183   1   1   1  1   1           5
## cg13080267   1   1   1  1   1           5
## cg04831745   1   1   1  1   1           5
## cg11358878   1   1   1  1   1           5
## cg02901522   1   1   1  1   1           5
## cg14170504   1   1   1  1   1           5
## cg14924512   1   1   1  1   1           5
## cg16390578   1   1   1  1   1           5
## cg09247979   1   1   1  1   1           5
## cg24851651   1   1   1  1   1           5
## cg04242342   1   1   1  1   1           5
## cg18037388   1   1   1  1   1           5
## cg18821122   1   1   1  1   1           5
## cg04467639   1   1   1  1   1           5
## cg00977253   1   1   1  1   1           5
## cg08584917   1   1   1  1   1           5
## cg26889118   1   1   1  1   1           5
## cg14904299   1   1   1  1   1           5
## cg17329602   1   1   1  1   1           5
## cg06697310   1   1   1  1   1           5
## cg07456472   1   1   1  1   1           5
## cg23916408   1   1   1  1   1           5
## cg21533482   1   1   1  1   1           5
## cg11834635   1   1   1  1   1           5
## cg14465143   1   1   1  1   1           5
## cg16098618   1   1   1  1   1           5
## cg02656016   1   1   1  1   1           5
## cg05351360   1   1   1  1   1           5
## cg10507965   1   1   1  1   1           5
## cg17811452   1   1   1  1   1           5
## cg12284872   1   1   1  1   1           5
## cg00999469   1   1   1  1   1           5
## cg02823329   1   1   1  1   1           5
## cg08397053   1   1   1  1   1           5
## cg12279734   1   1   1  1   1           5
## cg06624143   1   1   1  1   1           5
## cg03628603   1   1   1  1   1           5
## cg02389264   1   1   1  1   1           5
## cg05373298   1   1   1  1   1           5
## cg04073914   1   1   1  1   1           5
## cg16268937   1   1   1  1   1           5
## cg03115532   1   1   1  1   1           5
## cg14252149   1   1   1  1   1           5
## cg10542624   1   1   1  1   1           5
## cg16361249   1   1   1  1   1           5
## cg13226272   1   1   1  1   1           5
## cg27224751   1   1   1  1   1           5
## cg12074150   1   1   1  1   1           5
## cg00332268   1   1   1  1   1           5
## cg27187580   1   1   1  1   1           5
## cg19555075   1   1   1  1   1           5
## cg04867412   1   1   1  1   1           5
## cg25174111   1   1   1  1   1           5
## cg15399577   1   1   1  1   1           5
## cg04033559   1   1   1  1   1           5
## cg11314779   1   1   1  1   1           5
## cg04845852   1   1   1  1   1           5
## cg04768387   1   1   1  1   1           5
## cg22653957   1   1   1  1   1           5
## cg24422984   1   1   1  1   1           5
## cg17002338   1   1   1  1   1           5
## cg21986118   1   1   1  1   1           5
## cg23813394   1   1   1  1   1           5
## cg02489327   1   1   1  1   1           5
## cg12466610   1   1   1  1   1           5
## cg04771146   1   1   1  1   1           5
## cg01608425   1   1   1  1   1           5
## cg07304760   1   1   1  1   1           5
## cg13885788   1   1   1  1   1           5
## cg11227702   1   1   1  1   1           5
## cg12689021   1   1   1  1   1           5
## cg17906851   1   1   1  1   1           5
## cg05377703   1   1   1  1   1           5
## cg02495179   1   1   1  1   1           5
## cg04664583   1   1   1  1   1           5
## cg26948066   1   1   1  1   1           5
## cg20094343   1   1   1  1   1           5
## cg00156497   1   1   1  1   1           5
## cg27341708   1   1   1  1   1           5
## cg02981548   1   1   1  1   1           5
## cg16020483   1   1   1  1   1           5
## cg18861767   1   1   1  1   1           5
## cg03327352   1   1   1  1   1           5
## cg27639199   1   1   1  1   1           5
## cg02627240   1   1   1  1   1           5
## cg22681945   1   1   1  1   1           5
## cg11109139   1   1   1  1   1           5
## cg02095601   1   1   1  1   1           5
##  [ reached 'max' / getOption("max.print") -- omitted 116 rows ]
Combine with the importance data frame
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0


# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
                                        all_output_impAvg_ordered_full, 
                                        by.x = "Feature", 
                                        by.y = "Feature", 
                                        all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full, 
                                all_output_impAvg_ordered_full, 
                                by = "Feature", 
                                all = TRUE)

print(head(feature_output_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   1   1   0  1   1           4
## 2 cg00051154   1   1   1  1   1           5
## 3 cg00156497   1   1   1  1   1           5
## 4 cg00322003   1   1   1  1   1           5
## 5 cg00332268   1   1   1  1   1           5
## 6 cg00421199   1   1   1  1   1           5
print(head(all_output_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now     0.001866599      0.0000000       0.0000000     0.6139348            0.4          0.2031603
## 2 cg00051154     0.008799961      0.0000000       0.1421357     0.7665459            0.4          0.2634963
## 3 cg00156497     0.041645211      0.0000000       0.1365957     0.6326300            0.6          0.2821742
## 4 cg00322003     0.213499057      0.0715005       0.3143059     0.5848384            0.4          0.3168288
## 5 cg00332268     0.071615682      0.0000000       0.1245249     0.5517628            0.6          0.2695807
## 6 cg00421199     0.178414680      0.2872727       0.2686122     0.4599969            0.4          0.3188593
print(head(all_Output_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now   1   1   0  1   1           4     0.001866599      0.0000000       0.0000000     0.6139348            0.4          0.2031603
## 2 cg00051154   1   1   1  1   1           5     0.008799961      0.0000000       0.1421357     0.7665459            0.4          0.2634963
## 3 cg00156497   1   1   1  1   1           5     0.041645211      0.0000000       0.1365957     0.6326300            0.6          0.2821742
## 4 cg00322003   1   1   1  1   1           5     0.213499057      0.0715005       0.3143059     0.5848384            0.4          0.3168288
## 5 cg00332268   1   1   1  1   1           5     0.071615682      0.0000000       0.1245249     0.5517628            0.6          0.2695807
## 6 cg00421199   1   1   1  1   1           5     0.178414680      0.2872727       0.2686122     0.4599969            0.4          0.3188593
Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 263
##   DX         PC1     PC2 cg11787167 cg09216282 cg01680303 cg12080266 cg19503462 cg02356645 cg06378561 cg07152869 cg03084184 cg01013522 cg26739327 cg06864789 cg14780448 cg02932958 cg12858518 cg04124201
##   <fct>    <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN    -0.173    0.0575     0.0467      0.924      0.134      0.945      0.454      0.583      0.938      0.505      0.788      0.886     0.769      0.461      0.670       0.421      0.929      0.331
## 2 CN    -0.00367  0.0837     0.326       0.926      0.757      0.936      0.700      0.570      0.515      0.835      0.455      0.543     0.873      0.875      0.621       0.383      0.902      0.324
## 3 Deme… -0.187   -0.0112     0.432       0.935      0.477      0.640      0.719      0.568      0.940      0.519      0.781      0.843     0.834      0.490      0.0443      0.762      0.919      0.433
## 4 CN    -0.0379   0.0157     0.465       0.866      0.513      0.575      0.421      0.919      0.927      0.808      0.773      0.824     0.105      0.0542     0.913       0.761      0.935      0.307
## 5 Deme… -0.139    0.0299     0.0569      0.921      0.110      0.539      0.740      0.907      0.927      0.773      0.944      0.512     0.757      0.835      0.911       0.396      0.912      0.375
## 6 CN    -0.213    0.0518     0.421       0.921      0.304      0.554      0.417      0.895      0.513      0.805      0.422      0.492     0.0827     0.374      0.655       0.699      0.896      0.373
## # ℹ 244 more variables: cg03982462 <dbl>, cg12306781 <dbl>, cg23432430 <dbl>, cg00322003 <dbl>, cg04109990 <dbl>, cg27114706 <dbl>, cg15775217 <dbl>, cg20218135 <dbl>, cg03392100 <dbl>,
## #   cg17044529 <dbl>, cg27452255 <dbl>, cg02078724 <dbl>, cg05096415 <dbl>, cg20507276 <dbl>, cg25561557 <dbl>, cg17623720 <dbl>, cg17118775 <dbl>, cg12471283 <dbl>, cg00421199 <dbl>,
## #   cg02217425 <dbl>, cg16338321 <dbl>, cg20913114 <dbl>, cg14764203 <dbl>, cg15730644 <dbl>, cg16715186 <dbl>, cg24861747 <dbl>, cg09584650 <dbl>, cg09650803 <dbl>, cg23698271 <dbl>,
## #   cg12702014 <dbl>, cg22901347 <dbl>, cg07584620 <dbl>, cg13799572 <dbl>, cg18339359 <dbl>, cg22274273 <dbl>, cg10701746 <dbl>, cg04798314 <dbl>, cg01280698 <dbl>, cg05749243 <dbl>,
## #   cg26474732 <dbl>, cg06870118 <dbl>, cg15700429 <dbl>, cg24065597 <dbl>, cg05841700 <dbl>, cg03640465 <dbl>, cg18526121 <dbl>, cg21575308 <dbl>, cg11716267 <dbl>, cg15591384 <dbl>,
## #   cg10786572 <dbl>, cg21578644 <dbl>, cg07138269 <dbl>, cg24104387 <dbl>, cg12240569 <dbl>, cg04218584 <dbl>, cg21501207 <dbl>, cg03172493 <dbl>, cg11835797 <dbl>, cg00841008 <dbl>,
## #   cg18662228 <dbl>, cg02302183 <dbl>, cg13080267 <dbl>, cg04831745 <dbl>, cg11358878 <dbl>, cg02901522 <dbl>, cg14170504 <dbl>, cg14924512 <dbl>, cg16390578 <dbl>, cg09247979 <dbl>, …
## [1] "The number of final used features of common importance method: 262"
##   [1] "PC1"        "PC2"        "cg11787167" "cg09216282" "cg01680303" "cg12080266" "cg19503462" "cg02356645" "cg06378561" "cg07152869" "cg03084184" "cg01013522" "cg26739327" "cg06864789" "cg14780448"
##  [16] "cg02932958" "cg12858518" "cg04124201" "cg03982462" "cg12306781" "cg23432430" "cg00322003" "cg04109990" "cg27114706" "cg15775217" "cg20218135" "cg03392100" "cg17044529" "cg27452255" "cg02078724"
##  [31] "cg05096415" "cg20507276" "cg25561557" "cg17623720" "cg17118775" "cg12471283" "cg00421199" "cg02217425" "cg16338321" "cg20913114" "cg14764203" "cg15730644" "cg16715186" "cg24861747" "cg09584650"
##  [46] "cg09650803" "cg23698271" "cg12702014" "cg22901347" "cg07584620" "cg13799572" "cg18339359" "cg22274273" "cg10701746" "cg04798314" "cg01280698" "cg05749243" "cg26474732" "cg06870118" "cg15700429"
##  [61] "cg24065597" "cg05841700" "cg03640465" "cg18526121" "cg21575308" "cg11716267" "cg15591384" "cg10786572" "cg21578644" "cg07138269" "cg24104387" "cg12240569" "cg04218584" "cg21501207" "cg03172493"
##  [76] "cg11835797" "cg00841008" "cg18662228" "cg02302183" "cg13080267" "cg04831745" "cg11358878" "cg02901522" "cg14170504" "cg14924512" "cg16390578" "cg09247979" "cg24851651" "cg04242342" "cg18037388"
##  [91] "cg18821122" "cg04467639" "cg00977253" "cg08584917" "cg26889118" "cg14904299" "cg17329602" "cg06697310" "cg07456472" "cg23916408" "cg21533482" "cg11834635" "cg14465143" "cg16098618" "cg02656016"
## [106] "cg05351360" "cg10507965" "cg17811452" "cg12284872" "cg00999469" "cg02823329" "cg08397053" "cg12279734" "cg06624143" "cg03628603" "cg02389264" "cg05373298" "cg04073914" "cg16268937" "cg03115532"
## [121] "cg14252149" "cg10542624" "cg16361249" "cg13226272" "cg27224751" "cg12074150" "cg00332268" "cg27187580" "cg19555075" "cg04867412" "cg25174111" "cg15399577" "cg04033559" "cg11314779" "cg04845852"
## [136] "cg04768387" "cg22653957" "cg24422984" "cg17002338" "cg21986118" "cg23813394" "cg02489327" "cg12466610" "cg04771146" "cg01608425" "cg07304760" "cg13885788" "cg11227702" "cg12689021" "cg17906851"
## [151] "cg05377703" "cg02495179" "cg04664583" "cg26948066" "cg20094343" "cg00156497" "cg27341708" "cg02981548" "cg16020483" "cg18861767" "cg03327352" "cg27639199" "cg02627240" "cg22681945" "cg11109139"
## [166] "cg02095601" "cg16733676" "cg16089727" "cg17419220" "cg17429539" "cg10058204" "cg12776173" "cg25758034" "cg06032337" "cg10829391" "cg26007606" "cg14181112" "cg26081710" "cg00051154" "cg01130884"
## [181] "cg17386240" "cg12333628" "cg26983017" "cg24638099" "PC3"        "cg19248407" "cg16310958" "cg23836570" "cg03167407" "cg06012621" "cg21757617" "cg05161773" "cg03359067" "cg02872767" "cg12108278"
## [196] "cg27286614" "cg24859648" "cg12556569" "cg16858433" "cg19512141" "cg06264882" "cg10666341" "cg00675157" "cg26052728" "cg08242313" "cg22071943" "cg12434901" "cg23840008" "cg11173002" "cg05059349"
## [211] "cg05321907" "cg23350716" "cg00648024" "cg11706829" "cg02494911" "cg10844498" "cg03187614" "cg04970287" "cg12213037" "cg05813498" "cg20678988" "cg18029737" "cg12012426" "cg12421087" "cg16431720"
## [226] "age.now"    "cg17296678" "cg26901661" "cg07951602" "cg17348244" "cg03057303" "cg07971231" "cg01097733" "cg04577745" "cg05125667" "cg20070588" "cg15535896" "cg12293347" "cg26757229" "cg06875704"
## [241] "cg22251955" "cg23947654" "cg09518270" "cg06536614" "cg11331837" "cg23161429" "cg09993718" "cg00729708" "cg19848641" "cg12738248" "cg01802772" "cg10985055" "cg03088219" "cg16536985" "cg26089705"
## [256] "cg12925689" "cg05130642" "cg05138546" "cg16527629" "cg11826549" "cg06002867" "cg20704148"
##                           DX          PC1         PC2 cg11787167 cg09216282 cg01680303 cg12080266 cg19503462 cg02356645 cg06378561 cg07152869 cg03084184 cg01013522 cg26739327 cg06864789 cg14780448
## 200223270003_R03C01       CN -0.172761185  0.05745834 0.04673831  0.9244259  0.1344941  0.9450629  0.4537684  0.5833923  0.9377503   0.505063  0.7877128  0.8862821  0.7693268  0.4605312 0.67021018
## 200223270003_R06C01       CN -0.003667305  0.08372861 0.32564508  0.9263996  0.7573869  0.9363381  0.6997359  0.5701428  0.5154019   0.835249  0.4546397  0.5425308  0.8727608  0.8751365 0.62073547
## 200223270003_R07C01 Dementia -0.186779607 -0.01117250 0.43162543  0.9352308  0.4772204  0.6398247  0.7189778  0.5683381  0.9403569   0.519430  0.7812413  0.8429862  0.8340445  0.4902033 0.04425741
##                     cg02932958 cg12858518 cg04124201 cg03982462 cg12306781 cg23432430 cg00322003 cg04109990 cg27114706 cg15775217 cg20218135 cg03392100 cg17044529 cg27452255 cg02078724 cg05096415
## 200223270003_R03C01  0.4210489  0.9285252  0.3308589  0.6023731  0.8663817  0.9455418  0.5702070  0.6476604  0.9359259  0.9168327 0.64278153  0.9227394  0.9117895  0.6593379  0.2896133  0.5177819
## 200223270003_R06C01  0.3825995  0.9017533  0.3241613  0.8778458  0.8027798  0.9418716  0.3077122  0.6692040  0.9285384  0.6042521 0.06509247  0.8902340  0.9290636  0.9012217  0.2805612  0.6288426
## 200223270003_R07C01  0.7617081  0.9187879  0.4332693  0.8860227  0.8787250  0.9426559  0.6104341  0.9024920  0.4787397  0.9062231 0.65642359  0.4359657  0.9402858  0.8898635  0.2739571  0.6060271
##                     cg20507276 cg25561557 cg17623720 cg17118775 cg12471283 cg00421199 cg02217425 cg16338321 cg20913114 cg14764203 cg15730644 cg16715186 cg24861747 cg09584650 cg09650803 cg23698271
## 200223270003_R03C01 0.38721972 0.03851635  0.8988624  0.5585676  0.8658731  0.8532461  0.1032503  0.8294062 0.80382984  0.4683709  0.4353906  0.7946153  0.4309505 0.09661586  0.8954464  0.9109565
## 200223270003_R06C01 0.47978438 0.47259480  0.8172384  0.2916054  0.6963410  0.8891803  0.6592850  0.4918708 0.03158439  0.8916566  0.8763048  0.8124316  0.8071462 0.52399749  0.9113477  0.9051701
## 200223270003_R07C01 0.02261996 0.43364249  0.8226085  0.2868948  0.6680611  0.8937751  0.8792021  0.5245645 0.81256840  0.8714472  0.4833709  0.7773263  0.3347317 0.11587211  0.2518414  0.8804362
##                     cg12702014  cg22901347 cg07584620 cg13799572 cg18339359 cg22274273 cg10701746 cg04798314 cg01280698 cg05749243 cg26474732 cg06870118 cg15700429 cg24065597 cg05841700 cg03640465
## 200223270003_R03C01  0.7848681 0.001690332  0.3763980  0.8449584  0.9040272  0.4246379  0.4868342 0.07119798 0.88462009  0.9209685  0.8184088  0.8100144  0.9114530  0.2221098  0.9146488  0.2531644
## 200223270003_R06C01  0.8065993 0.103413834  0.8530961  0.4409219  0.8552121  0.4196796  0.4927257 0.09248843 0.88471320  0.9143061  0.7358417  0.7802055  0.8838233  0.7036129  0.3737990  0.2904433
## 200223270003_R07C01  0.7458594 0.632991482  0.3888623  0.8516975  0.3073106  0.4164100  0.8552180 0.06972566 0.06370005  0.9121180  0.7509296  0.7917257  0.9095363  0.2407676  0.5046468  0.9024530
##                     cg18526121 cg21575308 cg11716267 cg15591384 cg10786572 cg21578644 cg07138269 cg24104387 cg12240569 cg04218584 cg21501207 cg03172493 cg11835797 cg00841008 cg18662228 cg02302183
## 200223270003_R03C01  0.4762313 0.44702405 0.04959702  0.7870275  0.5982086  0.9260863  0.9426707  0.5339034 0.02690547  0.8971263  0.6813712 0.63362492  0.9007408 0.61899333  0.8730153  0.9191148
## 200223270003_R06C01  0.4833367 0.44792570 0.49143010  0.7429614  0.0935115  0.9159726  0.5057781  0.3007614 0.46030640  0.8491768  0.4747229 0.06148804  0.8944957 0.05401588  0.8602464  0.8749250
## 200223270003_R07C01  0.7761450 0.02822675 0.45857830  0.8346279  0.8436837  0.9178001  0.9400527  0.7509780 0.86185839  0.9008137  0.7422003 0.64562298  0.8168544 0.90769205  0.8683578  0.8888247
##                     cg13080267 cg04831745 cg11358878 cg02901522 cg14170504 cg14924512 cg16390578 cg09247979 cg24851651 cg04242342 cg18037388 cg18821122 cg04467639 cg00977253 cg08584917 cg26889118
## 200223270003_R03C01 0.78371483 0.71214149 0.83252951  0.9372901 0.02236650  0.9160885 0.20983422  0.5706177 0.05358297  0.8167892  0.7545086  0.5901603  0.6400206  0.9145988  0.9019732  0.9154836
## 200223270003_R06C01 0.09436069 0.06871768 0.87521203  0.4954978 0.02988245  0.9088414 0.06389068  0.5090215 0.05968923  0.8040357  0.7294565  0.5779620  0.5657041  0.8944518  0.9187789  0.9101336
## 200223270003_R07C01 0.09351259 0.90994644 0.08917903  0.9381188 0.48543531  0.9081681 0.23101450  0.5066661 0.60864179  0.8286115  0.2391659  0.9251431  0.6302917  0.9150206  0.6007449  0.5759967
##                     cg14904299 cg17329602 cg06697310 cg07456472 cg23916408 cg21533482 cg11834635 cg14465143 cg16098618 cg02656016 cg05351360 cg10507965 cg17811452 cg12284872 cg00999469 cg02823329
## 200223270003_R03C01  0.2712472  0.8189317  0.8653044  0.5856904  0.9154993  0.8288469  0.8880887  0.5543068  0.2571464  0.2355680 0.03855181  0.4010973 0.82740141  0.7414569  0.2857719  0.6464005
## 200223270003_R06C01  0.8364544  0.8478185  0.2405168  0.3886482  0.8886255  0.6766373  0.2493491  0.2702875  0.6899734  0.9052318 0.76395533  0.4033691 0.09338396  0.7725267  0.2499229  0.9633930
## 200223270003_R07C01  0.8193867  0.8596400  0.8479193  0.9186405  0.8872447  0.6235932  0.2210428  0.2621492  0.6488005  0.8653682 0.77000888  0.3869543 0.79817238  0.7573369  0.2819622  0.6617541
##                     cg08397053 cg12279734 cg06624143 cg03628603 cg02389264 cg05373298 cg04073914 cg16268937 cg03115532 cg14252149 cg10542624 cg16361249 cg13226272 cg27224751 cg12074150 cg00332268
## 200223270003_R03C01 0.04199567  0.1494651  0.4899758  0.9157246  0.7900942 0.02652391 0.03089677  0.8931712  0.8659608 0.02450779 0.02189577 0.52843073  0.5410002 0.03214912 0.18602738  0.9044887
## 200223270003_R06C01 0.04437741  0.8760759  0.9107688  0.8851075  0.7789974 0.83538124 0.89962516  0.9034556  0.8533871 0.02382413 0.54330620 0.09039669  0.4437070 0.83123722 0.14231506  0.5777209
## 200223270003_R07C01 0.59796746  0.8674214  0.9217350  0.8923890  0.4174463 0.89506024 0.47195215  0.8928450  0.4416574 0.56212480 0.54991492 0.42039062  0.0265215 0.79732117 0.09201303  0.5848006
##                     cg27187580 cg19555075 cg04867412 cg25174111 cg15399577 cg04033559 cg11314779 cg04845852 cg04768387 cg22653957 cg24422984 cg17002338 cg21986118 cg23813394 cg02489327 cg12466610
## 200223270003_R03C01  0.6643576  0.4921409  0.8796800  0.8573844  0.8785443  0.8768243  0.8966100  0.9212268  0.9465814  0.6442184  0.5462594  0.2684163  0.6571296 0.48811365  0.8616312 0.59131778
## 200223270003_R06C01  0.6914924  0.4261618  0.4497115  0.2567745  0.8703169  0.8257388  0.8908661  0.5118209  0.9098563  0.9531308  0.5193121  0.2811103  0.7034445 0.02943436  0.8777949 0.06939623
## 200223270003_R07C01  0.9357074  0.4694729  0.4445373  0.1903803  0.8968856  0.8900962  0.9048316  0.9034373  0.9413240  0.6534542  0.1970387  0.2706349  0.9055894 0.92935625  0.4205073 0.04527733
##                     cg04771146 cg01608425 cg07304760 cg13885788 cg11227702 cg12689021 cg17906851 cg05377703 cg02495179 cg04664583 cg26948066 cg20094343 cg00156497 cg27341708 cg02981548 cg16020483
## 200223270003_R03C01  0.7648566  0.9264388  0.5798534  0.9369476 0.49184121  0.7449475  0.9529718  0.8213047  0.7373055  0.5881190  0.5026045  0.7128750  0.5194653 0.02613847  0.5220037  0.1673606
## 200223270003_R06C01  0.3125007  0.8887753  0.5575516  0.5163017 0.02543724  0.7872237  0.6462151  0.5152514  0.5588114  0.9352717  0.9101976  0.3291595  0.9024063 0.86893582  0.5098965  0.1209622
## 200223270003_R07C01  0.2909958  0.9065432  0.9195617  0.9183376 0.45150971  0.7523141  0.9553497  0.7773036  0.5273309  0.9350230  0.9379543  0.4013815  0.9067989 0.02642300  0.5660985  0.2499647
##                     cg18861767 cg03327352 cg27639199 cg02627240 cg22681945 cg11109139 cg02095601 cg16733676 cg16089727 cg17419220 cg17429539 cg10058204 cg12776173 cg25758034 cg06032337 cg10829391
## 200223270003_R03C01  0.7847380  0.8786878 0.67552763 0.57129408  0.8388195  0.6350109  0.9161259  0.8904541 0.54996692 0.43470227  0.7100923  0.5834496  0.8730635  0.6649219  0.5657198  0.5929616
## 200223270003_R06C01  0.4734572  0.3042310 0.06233093 0.05309659  0.8700500  0.6904482  0.2233062  0.1698111 0.05876736 0.02781411  0.7660838  0.0549494  0.7009491  0.2393844  0.5653758  0.9411947
## 200223270003_R07C01  0.7312175  0.8273211 0.05701332 0.52179136  0.3344105  0.6274025  0.8978191  0.9203317 0.85485461 0.42803809  0.6984969  0.5689591  0.1136716  0.7071501  0.5229594  0.9322956
##                     cg26007606 cg14181112 cg26081710 cg00051154 cg01130884 cg17386240 cg12333628 cg26983017 cg24638099          PC3 cg19248407 cg16310958 cg23836570 cg03167407 cg06012621 cg21757617
## 200223270003_R03C01  0.5615550  0.1615405  0.9198212 0.08370609  0.6230659  0.7144809  0.9092861 0.03145466  0.4262170  0.005055871  0.8313131  0.9300073 0.54259383  0.7610292  0.8579519  0.4429909
## 200223270003_R06C01  0.1463111  0.3424621  0.8801892 0.61288950  0.2847748  0.8074824  0.5084647 0.84677625  0.8787392  0.029143653  0.8525281  0.9228871 0.03267304  0.3087606  0.5325037  0.4472538
## 200223270003_R07C01  0.8101800  0.2178314  0.9153264 0.07638127  0.2313285  0.7227918  0.5229394 0.53922255  0.8682765 -0.032302430  0.8467857  0.8539019 0.59939745  0.2455453  0.6263080  0.4339315
##                     cg05161773 cg03359067 cg02872767 cg12108278 cg27286614 cg24859648 cg12556569 cg16858433 cg19512141 cg06264882 cg10666341 cg00675157 cg26052728 cg08242313 cg22071943 cg12434901
## 200223270003_R03C01  0.4154907  0.8628564  0.3886537  0.9243869  0.5933858 0.44392797 0.03924599  0.9194211  0.7903543 0.43678655  0.6731062  0.9242325  0.1513937  0.8953645  0.2442648  0.8458468
## 200223270003_R06C01  0.8526849  0.8144536  0.9099575  0.9068995  0.6348795 0.03341185 0.48636893  0.9271632  0.8404684 0.43703442  0.6443180  0.9254708  0.5254754  0.8573493  0.2644581  0.8299579
## 200223270003_R07C01  0.4259275  0.8737908  0.8603283  0.9131367  0.9468370 0.43582347 0.46498877  0.9288986  0.2202759 0.02439581  0.8970292  0.5447244  0.5600724  0.8992114  0.2599947  0.8482994
##                     cg23840008 cg11173002 cg05059349 cg05321907 cg23350716 cg00648024 cg11706829 cg02494911 cg10844498 cg03187614 cg04970287 cg12213037 cg05813498 cg20678988 cg18029737 cg12012426
## 200223270003_R03C01 0.66547425  0.5913599 0.04507417  0.1782629  0.7876873 0.40202875  0.5444785  0.2416332  0.1391318  0.8826518  0.8875750   0.248785  0.9039353  0.8548886  0.9016634  0.9434768
## 200223270003_R06C01 0.88483246  0.1878736 0.03898752  0.8427929  0.6960544 0.05579011  0.5669449  0.2520909  0.1385549  0.5131472  0.4651667   0.812695  0.6252849  0.7786685  0.7376586  0.9220044
## 200223270003_R07C01 0.09020907  0.5150840 0.85329923  0.8320504  0.7387498 0.03708944  0.8746449  0.2457032  0.7374725  0.5281030  0.9092326   0.506374  0.9086932  0.8260541  0.9397667  0.9241284
##                     cg12421087 cg16431720  age.now cg17296678 cg26901661 cg07951602 cg17348244 cg03057303 cg07971231 cg01097733 cg04577745 cg05125667 cg20070588 cg15535896 cg12293347 cg26757229
## 200223270003_R03C01  0.5399655  0.8692449 78.60000  0.5653917  0.8754981  0.8766842 0.81793075  0.8923039  0.8406145  0.6753081  0.2681033 0.54151552  0.5057088  0.9253926  0.9253031  0.1422661
## 200223270003_R06C01  0.5400348  0.8773137 80.40000  0.5272971  0.9021064  0.8918089 0.07241099  0.4954311  0.8447914  0.9131513  0.8570624 0.49090787  0.8654344  0.3320191  0.9176094  0.7933794
## 200223270003_R07C01  0.5291975  0.8988328 78.16441  0.7661613  0.8556831  0.8706938 0.78025001  0.4695066  0.8874706  0.6832952  0.9002276 0.01590936  0.8425849  0.9409104  0.6028463  0.8074830
##                     cg06875704 cg22251955 cg23947654 cg09518270 cg06536614 cg11331837 cg23161429 cg09993718 cg00729708 cg19848641 cg12738248 cg01802772 cg10985055  cg03088219 cg16536985 cg26089705
## 200223270003_R03C01  0.9181165 0.02254441  0.8079296  0.8870663  0.5746694 0.57150125  0.9099619  0.7227856  0.1188099  0.9155493 0.88010292 0.02361869  0.8631895 0.007435243  0.5418687 0.50810373
## 200223270003_R06C01  0.9200461 0.02714054  0.8017579  0.8765622  0.5773468 0.03182862  0.8833895  0.4378752  0.1206326  0.4888000 0.51121855 0.02401520  0.5456633 0.120155222  0.8392044 0.03322136
## 200223270003_R07C01  0.9048289 0.89577950  0.7584946  0.8135001  0.5848917 0.03832164  0.9134709  0.7067889  0.7636159  0.9139292 0.09131476 0.02200957  0.8825100 0.826554308  0.8822891 0.03118009
##                     cg12925689 cg05130642 cg05138546 cg16527629 cg11826549 cg06002867 cg20704148
## 200223270003_R03C01 0.38196419  0.8644077  0.6230487  0.4365003 0.04794983 0.84888752 0.02409027
## 200223270003_R06C01 0.02873309  0.3661324  0.8963047  0.0708336 0.03672380 0.02698175 0.02580923
## 200223270003_R07C01 0.38592071  0.3039272  0.9057159  0.4492586 0.51173417 0.48042117 0.47357786
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "PC2"        "cg11787167" "cg09216282" "cg01680303" "cg12080266" "cg19503462" "cg02356645" "cg06378561" "cg07152869" "cg03084184" "cg01013522" "cg26739327" "cg06864789" "cg14780448"
##  [16] "cg02932958" "cg12858518" "cg04124201" "cg03982462" "cg12306781" "cg23432430" "cg00322003" "cg04109990" "cg27114706" "cg15775217" "cg20218135" "cg03392100" "cg17044529" "cg27452255" "cg02078724"
##  [31] "cg05096415" "cg20507276" "cg25561557" "cg17623720" "cg17118775" "cg12471283" "cg00421199" "cg02217425" "cg16338321" "cg20913114" "cg14764203" "cg15730644" "cg16715186" "cg24861747" "cg09584650"
##  [46] "cg09650803" "cg23698271" "cg12702014" "cg22901347" "cg07584620" "cg13799572" "cg18339359" "cg22274273" "cg10701746" "cg04798314" "cg01280698" "cg05749243" "cg26474732" "cg06870118" "cg15700429"
##  [61] "cg24065597" "cg05841700" "cg03640465" "cg18526121" "cg21575308" "cg11716267" "cg15591384" "cg10786572" "cg21578644" "cg07138269" "cg24104387" "cg12240569" "cg04218584" "cg21501207" "cg03172493"
##  [76] "cg11835797" "cg00841008" "cg18662228" "cg02302183" "cg13080267" "cg04831745" "cg11358878" "cg02901522" "cg14170504" "cg14924512" "cg16390578" "cg09247979" "cg24851651" "cg04242342" "cg18037388"
##  [91] "cg18821122" "cg04467639" "cg00977253" "cg08584917" "cg26889118" "cg14904299" "cg17329602" "cg06697310" "cg07456472" "cg23916408" "cg21533482" "cg11834635" "cg14465143" "cg16098618" "cg02656016"
## [106] "cg05351360" "cg10507965" "cg17811452" "cg12284872" "cg00999469" "cg02823329" "cg08397053" "cg12279734" "cg06624143" "cg03628603" "cg02389264" "cg05373298" "cg04073914" "cg16268937" "cg03115532"
## [121] "cg14252149" "cg10542624" "cg16361249" "cg13226272" "cg27224751" "cg12074150" "cg00332268" "cg27187580" "cg19555075" "cg04867412" "cg25174111" "cg15399577" "cg04033559" "cg11314779" "cg04845852"
## [136] "cg04768387" "cg22653957" "cg24422984" "cg17002338" "cg21986118" "cg23813394" "cg02489327" "cg12466610" "cg04771146" "cg01608425" "cg07304760" "cg13885788" "cg11227702" "cg12689021" "cg17906851"
## [151] "cg05377703" "cg02495179" "cg04664583" "cg26948066" "cg20094343" "cg00156497" "cg27341708" "cg02981548" "cg16020483" "cg18861767" "cg03327352" "cg27639199" "cg02627240" "cg22681945" "cg11109139"
## [166] "cg02095601" "cg16733676" "cg16089727" "cg17419220" "cg17429539" "cg10058204" "cg12776173" "cg25758034" "cg06032337" "cg10829391" "cg26007606" "cg14181112" "cg26081710" "cg00051154" "cg01130884"
## [181] "cg17386240" "cg12333628" "cg26983017" "cg24638099" "PC3"        "cg19248407" "cg16310958" "cg23836570" "cg03167407" "cg06012621" "cg21757617" "cg05161773" "cg03359067" "cg02872767" "cg12108278"
## [196] "cg27286614" "cg24859648" "cg12556569" "cg16858433" "cg19512141" "cg06264882" "cg10666341" "cg00675157" "cg26052728" "cg08242313" "cg22071943" "cg12434901" "cg23840008" "cg11173002" "cg05059349"
## [211] "cg05321907" "cg23350716" "cg00648024" "cg11706829" "cg02494911" "cg10844498" "cg03187614" "cg04970287" "cg12213037" "cg05813498" "cg20678988" "cg18029737" "cg12012426" "cg12421087" "cg16431720"
## [226] "age.now"    "cg17296678" "cg26901661" "cg07951602" "cg17348244" "cg03057303" "cg07971231" "cg01097733" "cg04577745" "cg05125667" "cg20070588" "cg15535896" "cg12293347" "cg26757229" "cg06875704"
## [241] "cg22251955" "cg23947654" "cg09518270" "cg06536614" "cg11331837" "cg23161429" "cg09993718" "cg00729708" "cg19848641" "cg12738248" "cg01802772" "cg10985055" "cg03088219" "cg16536985" "cg26089705"
## [256] "cg12925689" "cg05130642" "cg05138546" "cg16527629" "cg11826549" "cg06002867" "cg20704148"
Importance of these features:
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
##       Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1     age.now   1   1   0  1   1           4     0.001866599     0.00000000     0.000000000     0.6139348            0.4          0.2031603
## 2  cg00051154   1   1   1  1   1           5     0.008799961     0.00000000     0.142135650     0.7665459            0.4          0.2634963
## 3  cg00156497   1   1   1  1   1           5     0.041645211     0.00000000     0.136595660     0.6326300            0.6          0.2821742
## 4  cg00322003   1   1   1  1   1           5     0.213499057     0.07150050     0.314305897     0.5848384            0.4          0.3168288
## 5  cg00332268   1   1   1  1   1           5     0.071615682     0.00000000     0.124524857     0.5517628            0.6          0.2695807
## 6  cg00421199   1   1   1  1   1           5     0.178414680     0.28727272     0.268612171     0.4599969            0.4          0.3188593
## 7  cg00648024   1   1   1  1   0           4     0.094232456     0.03535404     0.164267881     0.6135763            0.2          0.2214861
## 8  cg00675157   1   1   1  1   0           4     0.151885920     0.01291652     0.234806524     0.6576521            0.2          0.2514522
## 9  cg00729708   1   0   1  1   0           3     0.027206634     0.00000000     0.147571877     0.4901941            0.0          0.1329945
## 10 cg00841008   1   1   1  1   1           5     0.121911839     0.11874762     0.264310311     0.3697473            0.4          0.2549434
## 11 cg00977253   1   1   1  1   1           5     0.099275476     0.00000000     0.274998888     0.4279565            0.6          0.2804462
## 12 cg00999469   1   1   1  1   1           5     0.083433465     0.89724846     0.161381644     0.5247286            0.6          0.4533584
## 13 cg01013522   1   1   1  1   1           5     0.249215052     0.93951297     0.354283163     0.6198419            0.8          0.5925706
## 14 cg01097733   0   1   1  1   1           4     0.000000000     0.00000000     0.134377320     0.4114112            0.6          0.2291577
## 15 cg01130884   1   1   1  1   1           5     0.008565557     0.07199064     0.143232537     0.3618005            0.6          0.2371178
## 16 cg01280698   1   1   1  1   1           5     0.152722775     0.15764983     0.208769836     0.3436085            0.6          0.2925502
## 17 cg01608425   1   1   1  1   1           5     0.053622101     0.00000000     0.093243504     0.3987092            0.6          0.2291150
## 18 cg01680303   1   1   1  1   1           5     0.309902316     0.07636624     0.331702960     0.4608488            1.0          0.4357641
## 19 cg01802772   1   0   1  1   0           3     0.008695874     0.00000000     0.147856701     0.3162090            0.4          0.1745523
## 20 cg02078724   1   1   1  1   1           5     0.192778214     0.35839114     0.248572956     0.7252550            0.8          0.4649995
## 21 cg02095601   1   1   1  1   1           5     0.029930583     0.49133441     0.114961750     0.3634825            0.6          0.3199418
## 22 cg02217425   1   1   1  1   1           5     0.176884039     0.02943825     0.204001685     0.6966197            0.6          0.3413887
## 23 cg02302183   1   1   1  1   1           5     0.117770736     0.00000000     0.179270192     0.4512942            0.6          0.2696670
## 24 cg02356645   1   1   1  1   1           5     0.279482038     0.58786787     0.329596320     0.6062590            0.6          0.4806411
## 25 cg02389264   1   1   1  1   1           5     0.078467728     0.23338384     0.243931858     0.5799540            0.4          0.3071475
## 27 cg02489327   1   1   1  1   1           5     0.057382786     0.02021036     0.125892637     0.5610407            0.6          0.2729053
## 28 cg02494911   1   1   1  1   0           4     0.078753403     0.07702494     0.196181405     0.2763833            0.2          0.1656686
## 29 cg02495179   1   1   1  1   1           5     0.044122402     0.00000000     0.160878368     0.6241604            0.6          0.2858322
## 30 cg02627240   1   1   1  1   1           5     0.030961226     0.00000000     0.149519815     0.7292690            0.6          0.3019500
## 31 cg02656016   1   1   1  1   1           5     0.087787432     0.00000000     0.158458622     0.4752933            0.8          0.3043079
## 32 cg02823329   1   1   1  1   1           5     0.082359451     0.00000000     0.116123064     0.3329119            0.6          0.2262789
## 33 cg02872767   1   1   1  0   1           4     0.355859301     0.00000000     0.369732343     0.2713600            0.6          0.3193903
## 34 cg02901522   1   1   1  1   1           5     0.114464012     0.21430682     0.143969439     0.3271845            0.8          0.3199850
## 35 cg02932958   1   1   1  1   1           5     0.236524383     0.00000000     0.270688231     0.5306875            0.4          0.2875800
## 36 cg02981548   1   1   1  1   1           5     0.039135766     0.00000000     0.155480080     0.5620632            0.6          0.2713358
## 37 cg03057303   0   1   1  1   1           4     0.000000000     0.00000000     0.141203014     0.6960341            0.4          0.2474474
## 38 cg03084184   1   1   1  1   1           5     0.259990880     0.33122659     0.297498895     0.6343994            0.4          0.3846231
## 39 cg03088219   0   1   0  1   1           3     0.000000000     0.00000000     0.071260943     0.6499358            0.6          0.2642393
## 40 cg03115532   1   1   1  1   1           5     0.076821335     0.01195597     0.215592671     0.7689515            0.4          0.2946643
## 41 cg03167407   1   1   1  1   1           5     0.000000000     0.00000000     0.117253620     0.7805749            0.6          0.2995657
## 42 cg03172493   1   1   1  1   1           5     0.126841100     0.61338315     0.242779223     0.6919366            0.4          0.4149880
## 43 cg03187614   1   1   0  1   1           4     0.064763337     0.00000000     0.077996819     0.5044711            0.8          0.2894462
## 44 cg03327352   1   1   1  1   1           5     0.034673740     0.00000000     0.141204090     0.6736268            0.6          0.2899009
## 45 cg03359067   1   1   1  1   1           5     0.000000000     0.00000000     0.112932443     0.6252027            0.6          0.2676270
## 46 cg03392100   1   1   1  1   1           5     0.197948924     0.00000000     0.257219235     0.5249867            0.6          0.3160310
## 47 cg03628603   1   1   1  1   1           5     0.078518680     0.00000000     0.145016303     0.2979385            0.6          0.2242947
## 48 cg03640465   1   1   1  1   1           5     0.142752046     0.00000000     0.182351648     0.4179584            0.4          0.2286124
## 49 cg03982462   1   1   1  1   1           5     0.226377140     0.24822759     0.334149366     0.5503044            0.4          0.3518117
## 50 cg04033559   1   1   1  1   1           5     0.067618821     0.00000000     0.131156327     0.4486283            0.6          0.2494807
## 51 cg04073914   1   1   1  1   1           5     0.077615688     0.00000000     0.138426471     0.4318025            0.4          0.2095689
## 52 cg04109990   1   1   1  1   1           5     0.213353683     0.28274162     0.350486645     0.6216524            0.4          0.3736469
## 53 cg04124201   1   1   1  1   1           5     0.229926320     0.54853455     0.343343505     1.0000000            0.6          0.5443609
## 54 cg04218584   1   1   1  1   1           5     0.134853809     0.05809631     0.229250030     0.4423310            0.6          0.2929062
## 55 cg04242342   1   1   1  1   1           5     0.107148318     0.61308375     0.186563251     0.6781069            0.6          0.4369804
## 56 cg04467639   1   1   1  1   1           5     0.102211003     0.00000000     0.186777665     0.5462507            0.6          0.2870479
## 57 cg04577745   0   1   1  1   1           4     0.000000000     0.00000000     0.155012058     0.3597206            0.6          0.2229465
## 58 cg04664583   1   1   1  1   1           5     0.044098245     0.00000000     0.136774767     0.4531712            0.8          0.2868088
## 59 cg04768387   1   1   1  1   1           5     0.066113348     0.00000000     0.115165204     0.5407034            0.4          0.2243964
## 60 cg04771146   1   1   1  1   1           5     0.053951904     0.00000000     0.098112247     0.6572234            0.8          0.3218575
## 61 cg04798314   1   1   1  1   1           5     0.153372968     0.05012032     0.203218134     0.2992193            0.6          0.2611861
## 62 cg04831745   1   1   1  1   1           5     0.117057505     0.00000000     0.214937247     0.7274604            0.4          0.2918910
## 63 cg04845852   1   1   1  1   1           5     0.066936773     0.00000000     0.135345102     0.3680799            0.6          0.2340724
## 64 cg04867412   1   1   1  1   1           5     0.068785222     0.01358368     0.096609275     0.6519996            0.8          0.3261956
## 65 cg04970287   1   1   1  0   1           4     0.056612766     0.00000000     0.181837016     0.2477092            0.6          0.2172318
## 66 cg05059349   1   1   1  0   1           4     0.097820521     0.00000000     0.198214909     0.2209276            0.6          0.2233926
## 67 cg05096415   1   1   1  1   1           5     0.190895421     0.25266607     0.252740313     0.6535302            0.6          0.3899664
## 68 cg05125667   0   1   1  1   1           4     0.000000000     0.00000000     0.085223274     0.3928046            0.6          0.2156056
## 69 cg05130642   0   1   0  1   1           3     0.000000000     0.00000000     0.047798348     0.7246983            0.4          0.2344993
## 70 cg05138546   0   1   0  1   1           3     0.000000000     0.00000000     0.005965926     0.5500724            0.6          0.2312077
## 71 cg05161773   1   1   1  1   1           5     0.000000000     0.00000000     0.086195337     0.6929053            0.6          0.2758201
## 72 cg05321907   1   1   1  0   1           4     0.096367814     0.00000000     0.195209364     0.1790331            0.6          0.2141221
## 73 cg05351360   1   1   1  1   1           5     0.087310265     0.00000000     0.108503805     0.2842888            0.8          0.2560206
## 74 cg05373298   1   1   1  1   1           5     0.077699447     0.00000000     0.228168617     0.6963314            0.6          0.3204399
## 75 cg05377703   1   1   1  1   1           5     0.045379319     0.00000000     0.160176765     0.5824055            0.6          0.2775923
## 77 cg05749243   1   1   1  1   1           5     0.152093812     0.19494480     0.282771448     0.7459423            0.4          0.3551505
## 78 cg05813498   1   1   0  1   1           4     0.040088810     0.00000000     0.079890859     0.4603695            0.6          0.2360698
##  [ reached 'max' / getOption("max.print") -- omitted 186 rows ]

8.2 Output - Write Files

Data Frame with selected features

# Output data frame with selected features based on mean method:  
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"

if(Flag_8mean){

filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
  if (file.exists(OUTPUTPATH_mean)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Mean, 
            file = OUTPUTPATH_mean, 
            row.names = FALSE)
  }



}
if(Flag_8median){
  filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)

  if (file.exists(OUTPUTPATH_median)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Median, 
            file = OUTPUTPATH_median, 
            row.names = FALSE)
  }
}
if(Flag_8Fequency){
   filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
    if (file.exists(OUTPUTPATH_frequency)) {
      print("selected file based on frequency already exists")} 
    else {
      write.csv(df_process_Output_freq, 
                file = OUTPUTPATH_frequency, 
                row.names = FALSE)
  }
}

Phenotype Data Frame

# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
##                                barcodes RID.a     prop.B    prop.NK   prop.CD4T  prop.CD8T  prop.Mono prop.Neutro prop.Eosino       DX  age.now PTGENDER  ABETA   TAU  PTAU          PC1           PC2
## 200223270003_R02C01 200223270003_R02C01  2190 0.03164651 0.03609239 0.010771839 0.01481567 0.06533409   0.8413395           0      MCI 82.40000     Male  963.2 341.5 35.48 -0.214185447  1.470293e-02
## 200223270003_R03C01 200223270003_R03C01  4080 0.03556363 0.04697771 0.002321312 0.06381941 0.04901806   0.8022999           0       CN 78.60000   Female  950.6 295.9 28.08 -0.172761185  5.745834e-02
## 200223270003_R06C01 200223270003_R06C01  4505 0.07129589 0.04412218 0.037684081 0.11457236 0.08745402   0.6448715           0       CN 80.40000   Female 1705.0 353.2 28.49 -0.003667305  8.372861e-02
## 200223270003_R07C01 200223270003_R07C01  1010 0.02081699 0.07117668 0.040966085 0.00000000 0.04459325   0.8224470           0 Dementia 78.16441     Male  493.3 272.8 22.75 -0.186779607 -1.117250e-02
## 200223270006_R01C01 200223270006_R01C01  4226 0.02680465 0.04767947 0.128514873 0.09085886 0.07419209   0.6319501           0      MCI 62.90000   Female 1705.0 253.1 22.84  0.026814649  1.650735e-05
## 200223270006_R04C01 200223270006_R04C01  1190 0.07063013 0.05250647 0.064529118 0.04309168 0.08796080   0.6812818           0       CN 80.67796   Female 1336.0 439.3 40.78 -0.037862929  1.571950e-02
##                              PC3   ageGroup ageGroupsq DX_num uniqueID  Horvath
## 200223270003_R02C01 -0.014043316  0.6606949 0.43651772      0        1 61.50365
## 200223270003_R03C01  0.005055871  0.2806949 0.07878961      0        1 69.26678
## 200223270003_R06C01  0.029143653  0.4606949 0.21223977      0        1 96.84418
## 200223270003_R07C01 -0.032302430  0.2371357 0.05623333      1        1 61.76446
## 200223270006_R01C01  0.052947950 -1.2893051 1.66230770      0        1 59.33885
## 200223270006_R04C01 -0.008685676  0.4884909 0.23862336      0        1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")

if(phenoOutPUt_FLAG ){
  if (file.exists(OUTPUTPATH_phenotypePart)) {
  print("Phenotype File already exists")} 
  else {
  write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
  }
}
## [1] "Phenotype File already exists"

9. Selected Feature Performance

9.1 Selected Based on Mean

9.1.1 Input Feature For Evaluation

Performance of the selected output features based on Mean

processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process

AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 251
##   DX            PC1 cg24861747 cg06864789 cg01013522 cg04124201 cg15775217 cg27114706 cg23698271 cg25174111 cg02356645 cg14780448 cg26739327 cg23836570 cg02078724 cg18037388 cg20507276 cg12080266
##   <fct>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN       -0.173        0.431     0.461       0.886      0.331      0.917     0.936       0.911      0.857      0.583     0.670      0.769      0.543       0.290      0.755     0.387       0.945
## 2 CN       -0.00367      0.807     0.875       0.543      0.324      0.604     0.929       0.905      0.257      0.570     0.621      0.873      0.0327      0.281      0.729     0.480       0.936
## 3 Dementia -0.187        0.335     0.490       0.843      0.433      0.906     0.479       0.880      0.190      0.568     0.0443     0.834      0.599       0.274      0.239     0.0226      0.640
## 4 CN       -0.0379       0.600     0.0542      0.824      0.307      0.638     0.930       0.528      0.205      0.919     0.913      0.105      0.573       0.258      0.263     0.0357      0.575
## 5 Dementia -0.139        0.773     0.835       0.512      0.375      0.570     0.0419      0.910      0.866      0.907     0.911      0.757      0.918       0.273      0.845     0.235       0.539
## 6 CN       -0.213        0.731     0.374       0.492      0.373      0.886     0.947       0.903      0.203      0.895     0.655      0.0827     0.502       0.285      0.722     0.526       0.554
## # ℹ 233 more variables: cg00999469 <dbl>, cg07152869 <dbl>, cg18339359 <dbl>, cg16390578 <dbl>, cg04242342 <dbl>, cg01680303 <dbl>, cg20218135 <dbl>, cg25561557 <dbl>, cg03172493 <dbl>, PC2 <dbl>,
## #   cg19503462 <dbl>, cg13885788 <dbl>, cg18662228 <dbl>, cg22274273 <dbl>, cg06697310 <dbl>, cg05096415 <dbl>, cg07584620 <dbl>, cg03084184 <dbl>, cg11314779 <dbl>, cg12279734 <dbl>,
## #   cg06378561 <dbl>, cg12012426 <dbl>, cg17419220 <dbl>, cg09650803 <dbl>, cg10058204 <dbl>, cg04109990 <dbl>, cg12776173 <dbl>, cg05841700 <dbl>, cg11358878 <dbl>, cg10542624 <dbl>,
## #   cg06870118 <dbl>, cg14252149 <dbl>, cg10701746 <dbl>, cg11787167 <dbl>, cg23916408 <dbl>, cg05749243 <dbl>, cg22901347 <dbl>, cg03982462 <dbl>, cg12858518 <dbl>, cg17044529 <dbl>,
## #   cg09216282 <dbl>, cg24859648 <dbl>, cg02217425 <dbl>, cg24065597 <dbl>, cg17118775 <dbl>, cg15591384 <dbl>, cg17296678 <dbl>, cg26901661 <dbl>, cg27341708 <dbl>, cg26983017 <dbl>,
## #   cg12306781 <dbl>, cg09584650 <dbl>, cg24851651 <dbl>, cg04867412 <dbl>, PC3 <dbl>, cg04771146 <dbl>, cg23350716 <dbl>, cg05373298 <dbl>, cg02901522 <dbl>, cg02095601 <dbl>, cg02872767 <dbl>,
## #   cg19555075 <dbl>, cg00421199 <dbl>, cg00322003 <dbl>, cg11716267 <dbl>, cg18526121 <dbl>, cg03392100 <dbl>, cg22681945 <dbl>, cg11834635 <dbl>, cg12074150 <dbl>, cg13226272 <dbl>, …
print(selected_impAvg_ordered_NAME)
##   [1] "PC1"        "cg24861747" "cg06864789" "cg01013522" "cg04124201" "cg15775217" "cg27114706" "cg23698271" "cg25174111" "cg02356645" "cg14780448" "cg26739327" "cg23836570" "cg02078724" "cg18037388"
##  [16] "cg20507276" "cg12080266" "cg00999469" "cg07152869" "cg18339359" "cg16390578" "cg04242342" "cg01680303" "cg20218135" "cg25561557" "cg03172493" "PC2"        "cg19503462" "cg13885788" "cg18662228"
##  [31] "cg22274273" "cg06697310" "cg05096415" "cg07584620" "cg03084184" "cg11314779" "cg12279734" "cg06378561" "cg12012426" "cg17419220" "cg09650803" "cg10058204" "cg04109990" "cg12776173" "cg05841700"
##  [46] "cg11358878" "cg10542624" "cg06870118" "cg14252149" "cg10701746" "cg11787167" "cg23916408" "cg05749243" "cg22901347" "cg03982462" "cg12858518" "cg17044529" "cg09216282" "cg24859648" "cg02217425"
##  [61] "cg24065597" "cg17118775" "cg15591384" "cg17296678" "cg26901661" "cg27341708" "cg26983017" "cg12306781" "cg09584650" "cg24851651" "cg04867412" "PC3"        "cg04771146" "cg23350716" "cg05373298"
##  [76] "cg02901522" "cg02095601" "cg02872767" "cg19555075" "cg00421199" "cg00322003" "cg11716267" "cg18526121" "cg03392100" "cg22681945" "cg11834635" "cg12074150" "cg13226272" "cg26948066" "cg07456472"
##  [91] "cg22653957" "cg02389264" "cg12471283" "cg07138269" "cg02656016" "cg16268937" "cg10507965" "cg16715186" "cg02627240" "cg24104387" "cg12333628" "cg12689021" "cg03167407" "cg17623720" "cg25758034"
## [106] "cg18821122" "cg03115532" "cg09247979" "cg08584917" "cg13080267" "cg04218584" "cg27452255" "cg01280698" "cg08242313" "cg26007606" "cg04831745" "cg16089727" "cg12240569" "cg14924512" "cg03327352"
## [121] "cg03187614" "cg06012621" "cg11109139" "cg02932958" "cg04467639" "cg21575308" "cg04664583" "cg02495179" "cg14764203" "cg17906851" "cg19512141" "cg00156497" "cg16361249" "cg00977253" "cg21757617"
## [136] "cg15700429" "cg07951602" "cg16338321" "cg05377703" "cg11227702" "cg05161773" "cg02489327" "cg23432430" "cg14181112" "cg27187580" "cg12108278" "cg21533482" "cg02981548" "cg11173002" "cg10786572"
## [151] "cg20913114" "cg02302183" "cg00332268" "cg03359067" "cg03088219" "cg26889118" "cg00051154" "cg16536985" "cg06264882" "cg21986118" "cg04798314" "cg23813394" "cg16310958" "cg15730644" "cg05351360"
## [166] "cg11835797" "cg00841008" "cg12284872" "cg14465143" "cg00675157" "cg17348244" "cg07304760" "cg06624143" "cg26089705" "cg12702014" "cg04033559" "cg21501207" "cg14904299" "cg03057303" "cg12213037"
## [181] "cg22071943" "cg17429539" "cg21578644" "cg24422984" "cg13799572" "cg12556569" "cg12421087" "cg27286614" "cg07971231" "cg16733676" "cg27224751" "cg01130884" "cg16020483" "cg12925689" "cg05813498"
## [196] "cg19248407" "cg26474732" "cg05130642" "cg04845852" "cg16098618" "cg05138546" "cg17811452" "cg26081710" "cg01097733" "cg01608425" "cg17329602" "cg03640465" "cg17386240" "cg16527629" "cg12434901"
## [211] "cg26757229" "cg02823329" "cg16858433" "cg04768387" "cg03628603" "cg05059349" "cg04577745" "cg00648024" "cg23840008" "cg15399577" "cg08397053" "cg04970287" "cg24638099" "cg10666341" "cg05125667"
## [226] "cg14170504" "cg05321907" "cg20070588" "cg20678988" "cg10844498" "cg12466610" "cg15535896" "cg04073914" "cg11826549" "cg26052728" "cg06032337" "cg10829391" "cg27639199" "cg06002867" "cg16431720"
## [241] "age.now"    "cg20704148" "cg18861767" "cg17002338" "cg20094343" "cg11266396" "cg12293347" "cg25649515" "cg22251955" "cg15501526"
print(head(df_selected_Mean))
##                           DX          PC1 cg24861747 cg06864789 cg01013522 cg04124201 cg15775217 cg27114706 cg23698271 cg25174111 cg02356645 cg14780448 cg26739327 cg23836570 cg02078724 cg18037388
## 200223270003_R03C01       CN -0.172761185  0.4309505  0.4605312  0.8862821  0.3308589  0.9168327  0.9359259  0.9109565  0.8573844  0.5833923 0.67021018  0.7693268 0.54259383  0.2896133  0.7545086
## 200223270003_R06C01       CN -0.003667305  0.8071462  0.8751365  0.5425308  0.3241613  0.6042521  0.9285384  0.9051701  0.2567745  0.5701428 0.62073547  0.8727608 0.03267304  0.2805612  0.7294565
## 200223270003_R07C01 Dementia -0.186779607  0.3347317  0.4902033  0.8429862  0.4332693  0.9062231  0.4787397  0.8804362  0.1903803  0.5683381 0.04425741  0.8340445 0.59939745  0.2739571  0.2391659
##                     cg20507276 cg12080266 cg00999469 cg07152869 cg18339359 cg16390578 cg04242342 cg01680303 cg20218135 cg25561557 cg03172493         PC2 cg19503462 cg13885788 cg18662228 cg22274273
## 200223270003_R03C01 0.38721972  0.9450629  0.2857719   0.505063  0.9040272 0.20983422  0.8167892  0.1344941 0.64278153 0.03851635 0.63362492  0.05745834  0.4537684  0.9369476  0.8730153  0.4246379
## 200223270003_R06C01 0.47978438  0.9363381  0.2499229   0.835249  0.8552121 0.06389068  0.8040357  0.7573869 0.06509247 0.47259480 0.06148804  0.08372861  0.6997359  0.5163017  0.8602464  0.4196796
## 200223270003_R07C01 0.02261996  0.6398247  0.2819622   0.519430  0.3073106 0.23101450  0.8286115  0.4772204 0.65642359 0.43364249 0.64562298 -0.01117250  0.7189778  0.9183376  0.8683578  0.4164100
##                     cg06697310 cg05096415 cg07584620 cg03084184 cg11314779 cg12279734 cg06378561 cg12012426 cg17419220 cg09650803 cg10058204 cg04109990 cg12776173 cg05841700 cg11358878 cg10542624
## 200223270003_R03C01  0.8653044  0.5177819  0.3763980  0.7877128  0.8966100  0.1494651  0.9377503  0.9434768 0.43470227  0.8954464  0.5834496  0.6476604  0.8730635  0.9146488 0.83252951 0.02189577
## 200223270003_R06C01  0.2405168  0.6288426  0.8530961  0.4546397  0.8908661  0.8760759  0.5154019  0.9220044 0.02781411  0.9113477  0.0549494  0.6692040  0.7009491  0.3737990 0.87521203 0.54330620
## 200223270003_R07C01  0.8479193  0.6060271  0.3888623  0.7812413  0.9048316  0.8674214  0.9403569  0.9241284 0.42803809  0.2518414  0.5689591  0.9024920  0.1136716  0.5046468 0.08917903 0.54991492
##                     cg06870118 cg14252149 cg10701746 cg11787167 cg23916408 cg05749243  cg22901347 cg03982462 cg12858518 cg17044529 cg09216282 cg24859648 cg02217425 cg24065597 cg17118775 cg15591384
## 200223270003_R03C01  0.8100144 0.02450779  0.4868342 0.04673831  0.9154993  0.9209685 0.001690332  0.6023731  0.9285252  0.9117895  0.9244259 0.44392797  0.1032503  0.2221098  0.5585676  0.7870275
## 200223270003_R06C01  0.7802055 0.02382413  0.4927257 0.32564508  0.8886255  0.9143061 0.103413834  0.8778458  0.9017533  0.9290636  0.9263996 0.03341185  0.6592850  0.7036129  0.2916054  0.7429614
## 200223270003_R07C01  0.7917257 0.56212480  0.8552180 0.43162543  0.8872447  0.9121180 0.632991482  0.8860227  0.9187879  0.9402858  0.9352308 0.43582347  0.8792021  0.2407676  0.2868948  0.8346279
##                     cg17296678 cg26901661 cg27341708 cg26983017 cg12306781 cg09584650 cg24851651 cg04867412          PC3 cg04771146 cg23350716 cg05373298 cg02901522 cg02095601 cg02872767 cg19555075
## 200223270003_R03C01  0.5653917  0.8754981 0.02613847 0.03145466  0.8663817 0.09661586 0.05358297  0.8796800  0.005055871  0.7648566  0.7876873 0.02652391  0.9372901  0.9161259  0.3886537  0.4921409
## 200223270003_R06C01  0.5272971  0.9021064 0.86893582 0.84677625  0.8027798 0.52399749 0.05968923  0.4497115  0.029143653  0.3125007  0.6960544 0.83538124  0.4954978  0.2233062  0.9099575  0.4261618
## 200223270003_R07C01  0.7661613  0.8556831 0.02642300 0.53922255  0.8787250 0.11587211 0.60864179  0.4445373 -0.032302430  0.2909958  0.7387498 0.89506024  0.9381188  0.8978191  0.8603283  0.4694729
##                     cg00421199 cg00322003 cg11716267 cg18526121 cg03392100 cg22681945 cg11834635 cg12074150 cg13226272 cg26948066 cg07456472 cg22653957 cg02389264 cg12471283 cg07138269 cg02656016
## 200223270003_R03C01  0.8532461  0.5702070 0.04959702  0.4762313  0.9227394  0.8388195  0.8880887 0.18602738  0.5410002  0.5026045  0.5856904  0.6442184  0.7900942  0.8658731  0.9426707  0.2355680
## 200223270003_R06C01  0.8891803  0.3077122 0.49143010  0.4833367  0.8902340  0.8700500  0.2493491 0.14231506  0.4437070  0.9101976  0.3886482  0.9531308  0.7789974  0.6963410  0.5057781  0.9052318
## 200223270003_R07C01  0.8937751  0.6104341 0.45857830  0.7761450  0.4359657  0.3344105  0.2210428 0.09201303  0.0265215  0.9379543  0.9186405  0.6534542  0.4174463  0.6680611  0.9400527  0.8653682
##                     cg16268937 cg10507965 cg16715186 cg02627240 cg24104387 cg12333628 cg12689021 cg03167407 cg17623720 cg25758034 cg18821122 cg03115532 cg09247979 cg08584917 cg13080267 cg04218584
## 200223270003_R03C01  0.8931712  0.4010973  0.7946153 0.57129408  0.5339034  0.9092861  0.7449475  0.7610292  0.8988624  0.6649219  0.5901603  0.8659608  0.5706177  0.9019732 0.78371483  0.8971263
## 200223270003_R06C01  0.9034556  0.4033691  0.8124316 0.05309659  0.3007614  0.5084647  0.7872237  0.3087606  0.8172384  0.2393844  0.5779620  0.8533871  0.5090215  0.9187789 0.09436069  0.8491768
## 200223270003_R07C01  0.8928450  0.3869543  0.7773263 0.52179136  0.7509780  0.5229394  0.7523141  0.2455453  0.8226085  0.7071501  0.9251431  0.4416574  0.5066661  0.6007449 0.09351259  0.9008137
##                     cg27452255 cg01280698 cg08242313 cg26007606 cg04831745 cg16089727 cg12240569 cg14924512 cg03327352 cg03187614 cg06012621 cg11109139 cg02932958 cg04467639 cg21575308 cg04664583
## 200223270003_R03C01  0.6593379 0.88462009  0.8953645  0.5615550 0.71214149 0.54996692 0.02690547  0.9160885  0.8786878  0.8826518  0.8579519  0.6350109  0.4210489  0.6400206 0.44702405  0.5881190
## 200223270003_R06C01  0.9012217 0.88471320  0.8573493  0.1463111 0.06871768 0.05876736 0.46030640  0.9088414  0.3042310  0.5131472  0.5325037  0.6904482  0.3825995  0.5657041 0.44792570  0.9352717
## 200223270003_R07C01  0.8898635 0.06370005  0.8992114  0.8101800 0.90994644 0.85485461 0.86185839  0.9081681  0.8273211  0.5281030  0.6263080  0.6274025  0.7617081  0.6302917 0.02822675  0.9350230
##                     cg02495179 cg14764203 cg17906851 cg19512141 cg00156497 cg16361249 cg00977253 cg21757617 cg15700429 cg07951602 cg16338321 cg05377703 cg11227702 cg05161773 cg02489327 cg23432430
## 200223270003_R03C01  0.7373055  0.4683709  0.9529718  0.7903543  0.5194653 0.52843073  0.9145988  0.4429909  0.9114530  0.8766842  0.8294062  0.8213047 0.49184121  0.4154907  0.8616312  0.9455418
## 200223270003_R06C01  0.5588114  0.8916566  0.6462151  0.8404684  0.9024063 0.09039669  0.8944518  0.4472538  0.8838233  0.8918089  0.4918708  0.5152514 0.02543724  0.8526849  0.8777949  0.9418716
## 200223270003_R07C01  0.5273309  0.8714472  0.9553497  0.2202759  0.9067989 0.42039062  0.9150206  0.4339315  0.9095363  0.8706938  0.5245645  0.7773036 0.45150971  0.4259275  0.4205073  0.9426559
##                     cg14181112 cg27187580 cg12108278 cg21533482 cg02981548 cg11173002 cg10786572 cg20913114 cg02302183 cg00332268 cg03359067  cg03088219 cg26889118 cg00051154 cg16536985 cg06264882
## 200223270003_R03C01  0.1615405  0.6643576  0.9243869  0.8288469  0.5220037  0.5913599  0.5982086 0.80382984  0.9191148  0.9044887  0.8628564 0.007435243  0.9154836 0.08370609  0.5418687 0.43678655
## 200223270003_R06C01  0.3424621  0.6914924  0.9068995  0.6766373  0.5098965  0.1878736  0.0935115 0.03158439  0.8749250  0.5777209  0.8144536 0.120155222  0.9101336 0.61288950  0.8392044 0.43703442
## 200223270003_R07C01  0.2178314  0.9357074  0.9131367  0.6235932  0.5660985  0.5150840  0.8436837 0.81256840  0.8888247  0.5848006  0.8737908 0.826554308  0.5759967 0.07638127  0.8822891 0.02439581
##                     cg21986118 cg04798314 cg23813394 cg16310958 cg15730644 cg05351360 cg11835797 cg00841008 cg12284872 cg14465143 cg00675157 cg17348244 cg07304760 cg06624143 cg26089705 cg12702014
## 200223270003_R03C01  0.6571296 0.07119798 0.48811365  0.9300073  0.4353906 0.03855181  0.9007408 0.61899333  0.7414569  0.5543068  0.9242325 0.81793075  0.5798534  0.4899758 0.50810373  0.7848681
## 200223270003_R06C01  0.7034445 0.09248843 0.02943436  0.9228871  0.8763048 0.76395533  0.8944957 0.05401588  0.7725267  0.2702875  0.9254708 0.07241099  0.5575516  0.9107688 0.03322136  0.8065993
## 200223270003_R07C01  0.9055894 0.06972566 0.92935625  0.8539019  0.4833709 0.77000888  0.8168544 0.90769205  0.7573369  0.2621492  0.5447244 0.78025001  0.9195617  0.9217350 0.03118009  0.7458594
##                     cg04033559 cg21501207 cg14904299 cg03057303 cg12213037 cg22071943 cg17429539 cg21578644 cg24422984 cg13799572 cg12556569 cg12421087 cg27286614 cg07971231 cg16733676 cg27224751
## 200223270003_R03C01  0.8768243  0.6813712  0.2712472  0.8923039   0.248785  0.2442648  0.7100923  0.9260863  0.5462594  0.8449584 0.03924599  0.5399655  0.5933858  0.8406145  0.8904541 0.03214912
## 200223270003_R06C01  0.8257388  0.4747229  0.8364544  0.4954311   0.812695  0.2644581  0.7660838  0.9159726  0.5193121  0.4409219 0.48636893  0.5400348  0.6348795  0.8447914  0.1698111 0.83123722
## 200223270003_R07C01  0.8900962  0.7422003  0.8193867  0.4695066   0.506374  0.2599947  0.6984969  0.9178001  0.1970387  0.8516975 0.46498877  0.5291975  0.9468370  0.8874706  0.9203317 0.79732117
##                     cg01130884 cg16020483 cg12925689 cg05813498 cg19248407 cg26474732 cg05130642 cg04845852 cg16098618 cg05138546 cg17811452 cg26081710 cg01097733 cg01608425 cg17329602 cg03640465
## 200223270003_R03C01  0.6230659  0.1673606 0.38196419  0.9039353  0.8313131  0.8184088  0.8644077  0.9212268  0.2571464  0.6230487 0.82740141  0.9198212  0.6753081  0.9264388  0.8189317  0.2531644
## 200223270003_R06C01  0.2847748  0.1209622 0.02873309  0.6252849  0.8525281  0.7358417  0.3661324  0.5118209  0.6899734  0.8963047 0.09338396  0.8801892  0.9131513  0.8887753  0.8478185  0.2904433
## 200223270003_R07C01  0.2313285  0.2499647 0.38592071  0.9086932  0.8467857  0.7509296  0.3039272  0.9034373  0.6488005  0.9057159 0.79817238  0.9153264  0.6832952  0.9065432  0.8596400  0.9024530
##                     cg17386240 cg16527629 cg12434901 cg26757229 cg02823329 cg16858433 cg04768387 cg03628603 cg05059349 cg04577745 cg00648024 cg23840008 cg15399577 cg08397053 cg04970287 cg24638099
## 200223270003_R03C01  0.7144809  0.4365003  0.8458468  0.1422661  0.6464005  0.9194211  0.9465814  0.9157246 0.04507417  0.2681033 0.40202875 0.66547425  0.8785443 0.04199567  0.8875750  0.4262170
## 200223270003_R06C01  0.8074824  0.0708336  0.8299579  0.7933794  0.9633930  0.9271632  0.9098563  0.8851075 0.03898752  0.8570624 0.05579011 0.88483246  0.8703169 0.04437741  0.4651667  0.8787392
## 200223270003_R07C01  0.7227918  0.4492586  0.8482994  0.8074830  0.6617541  0.9288986  0.9413240  0.8923890 0.85329923  0.9002276 0.03708944 0.09020907  0.8968856 0.59796746  0.9092326  0.8682765
##                     cg10666341 cg05125667 cg14170504 cg05321907 cg20070588 cg20678988 cg10844498 cg12466610 cg15535896 cg04073914 cg11826549 cg26052728 cg06032337 cg10829391 cg27639199 cg06002867
## 200223270003_R03C01  0.6731062 0.54151552 0.02236650  0.1782629  0.5057088  0.8548886  0.1391318 0.59131778  0.9253926 0.03089677 0.04794983  0.1513937  0.5657198  0.5929616 0.67552763 0.84888752
## 200223270003_R06C01  0.6443180 0.49090787 0.02988245  0.8427929  0.8654344  0.7786685  0.1385549 0.06939623  0.3320191 0.89962516 0.03672380  0.5254754  0.5653758  0.9411947 0.06233093 0.02698175
## 200223270003_R07C01  0.8970292 0.01590936 0.48543531  0.8320504  0.8425849  0.8260541  0.7374725 0.04527733  0.9409104 0.47195215 0.51173417  0.5600724  0.5229594  0.9322956 0.05701332 0.48042117
##                     cg16431720  age.now cg20704148 cg18861767 cg17002338 cg20094343 cg11266396 cg12293347 cg25649515 cg22251955 cg15501526
## 200223270003_R03C01  0.8692449 78.60000 0.02409027  0.7847380  0.2684163  0.7128750 0.01905761  0.9253031 0.92357530 0.02254441  0.6319253
## 200223270003_R06C01  0.8773137 80.40000 0.02580923  0.4734572  0.2811103  0.3291595 0.53122014  0.9176094 0.58958387 0.02714054  0.7435100
## 200223270003_R07C01  0.8988328 78.16441 0.47357786  0.7312175  0.2706349  0.4013815 0.02421064  0.6028463 0.02958575 0.89577950  0.7756577
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.1.2. Logistic Regression Model

9.1.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)  
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 221 251
dim(testData)
## [1]  94 251
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        6
##   Dementia  2       22
##                                           
##                Accuracy : 0.9149          
##                  95% CI : (0.8392, 0.9625)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 5.403e-07       
##                                           
##                   Kappa : 0.7878          
##                                           
##  Mcnemar's Test P-Value : 0.2888          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.7857          
##          Pos Pred Value : 0.9143          
##          Neg Pred Value : 0.9167          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7447          
##       Balanced Accuracy : 0.8777          
##                                           
##        'Positive' Class : CN              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]

print(cm_FeatEval_Mean_LRM1_Accuracy)
##  Accuracy 
## 0.9148936
print(cm_FeatEval_Mean_LRM1_Kappa)
##     Kappa 
## 0.7878104
print(model_LRM1)
## glmnet 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa      
##   0.10   0.006203963  0.8824242   0.69924405
##   0.10   0.019618654  0.8778788   0.68707894
##   0.10   0.062039630  0.8325253   0.54918871
##   0.55   0.006203963  0.7647475   0.39111910
##   0.55   0.019618654  0.7557576   0.37090350
##   0.55   0.062039630  0.6787879   0.08521446
##   1.00   0.006203963  0.7016162   0.25919114
##   1.00   0.019618654  0.6743434   0.13505598
##   1.00   0.062039630  0.6696970  -0.04592241
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.006203963.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7597531
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.7597531
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9789
## [1] "The auc value is:"
## Area under the curve: 0.9789

if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_LRM1_AUC <- mean_auc
}
print(FeatEval_Mean_LRM1_AUC)
## Area under the curve: 0.9789
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC2          54.24
## cg02872767   35.62
## cg09216282   34.55
## cg11787167   33.01
## cg19503462   29.43
## cg12080266   29.30
## cg01680303   28.39
## cg02356645   28.26
## cg12108278   28.01
## cg06378561   27.88
## cg01013522   27.42
## cg03084184   27.22
## cg12858518   26.69
## cg06864789   26.04
## cg07152869   25.78
## cg26739327   24.54
## cg03982462   24.32
## cg12306781   23.94
## cg02932958   23.61
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##        Overall
## 1   4.67156947
## 2   2.53382228
## 3   1.66381904
## 4   1.61380434
## 5   1.54231245
## 6   1.37474545
## 7   1.36890908
## 8   1.32633118
## 9   1.32033646
## 10  1.30832861
## 11  1.30265053
## 12  1.28088946
## 13  1.27170395
## 14  1.24674246
## 15  1.21632809
## 16  1.20417705
## 17  1.14637480
## 18  1.13595634
## 19  1.11850390
## 20  1.10316036
## 21  1.09638557
## 22  1.05383610
## 23  1.04978214
## 24  1.04711132
## 25  1.03971877
## 26  1.00770340
## 27  0.99691845
## 28  0.99059722
## 29  0.97573584
## 30  0.95992174
## 31  0.94719418
## 32  0.94401614
## 33  0.93890948
## 34  0.91865292
## 35  0.89779421
## 36  0.89774313
## 37  0.89747023
## 38  0.87900461
## 39  0.87446591
## 40  0.87267504
## 41  0.86824473
## 42  0.83348120
## 43  0.82672144
## 44  0.81628932
## 45  0.81585458
## 46  0.80947795
## 47  0.80755357
## 48  0.80690721
## 49  0.80281251
## 50  0.79249155
## 51  0.79150505
## 52  0.78228751
## 53  0.78059931
## 54  0.77768503
## 55  0.77756191
## 56  0.77142458
## 57  0.76540017
## 58  0.76132436
## 59  0.75893274
## 60  0.75633711
## 61  0.75589451
## 62  0.73168860
## 63  0.73077851
## 64  0.72661774
## 65  0.72337182
## 66  0.72239268
## 67  0.71712888
## 68  0.71324429
## 69  0.71170218
## 70  0.70976787
## 71  0.69739291
## 72  0.69146592
## 73  0.68893704
## 74  0.68874647
## 75  0.68855634
## 76  0.67975869
## 77  0.67711903
## 78  0.67336530
## 79  0.65269484
## 80  0.64729708
## 81  0.64502626
## 82  0.63741400
## 83  0.60594639
## 84  0.59670543
## 85  0.58963620
## 86  0.58563046
## 87  0.58222279
## 88  0.58112563
## 89  0.57850345
## 90  0.57167721
## 91  0.57022821
## 92  0.56709181
## 93  0.56692575
## 94  0.55664918
## 95  0.55485681
## 96  0.54964757
## 97  0.54569754
## 98  0.54294596
## 99  0.54266597
## 100 0.53729873
## 101 0.53350806
## 102 0.53002931
## 103 0.52800739
## 104 0.52583762
## 105 0.51988332
## 106 0.51660640
## 107 0.51604877
## 108 0.51553153
## 109 0.50898619
## 110 0.50669073
## 111 0.50154464
## 112 0.49631371
## 113 0.48914541
## 114 0.48911797
## 115 0.48817683
## 116 0.48479954
## 117 0.48423523
## 118 0.48284740
## 119 0.47848101
## 120 0.47519286
## 121 0.45448869
## 122 0.45440447
## 123 0.45426207
## 124 0.45402368
## 125 0.45344073
## 126 0.45272620
## 127 0.44429602
## 128 0.43160774
## 129 0.42087994
## 130 0.40822258
## 131 0.40317000
## 132 0.40226331
## 133 0.40124150
## 134 0.38873524
## 135 0.38741974
## 136 0.38343648
## 137 0.38104780
## 138 0.37313107
## 139 0.37293254
## 140 0.37040196
## 141 0.36863039
## 142 0.36748697
## 143 0.36054972
## 144 0.35747729
## 145 0.35229098
## 146 0.34922645
## 147 0.34175180
## 148 0.33693547
## 149 0.33381778
## 150 0.33007956
## 151 0.32846898
## 152 0.32754490
## 153 0.32677575
## 154 0.31774664
## 155 0.31768325
## 156 0.31734206
## 157 0.30770483
## 158 0.29874822
## 159 0.29800282
## 160 0.29672308
## 161 0.28268691
## 162 0.27812244
## 163 0.27433595
## 164 0.26700329
## 165 0.26673425
## 166 0.26508682
## 167 0.25990866
## 168 0.25631071
## 169 0.25363692
## 170 0.25076745
## 171 0.24977178
## 172 0.24299963
## 173 0.23743311
## 174 0.23475892
## 175 0.22604967
## 176 0.21978431
## 177 0.21793264
## 178 0.21668550
## 179 0.21197971
## 180 0.20709225
## 181 0.20148821
## 182 0.19713515
## 183 0.19500270
## 184 0.19197226
## 185 0.19028246
## 186 0.17995258
## 187 0.16854231
## 188 0.16407372
## 189 0.14848694
## 190 0.14479331
## 191 0.14212863
## 192 0.14153775
## 193 0.13886688
## 194 0.13782026
## 195 0.13099809
## 196 0.11400959
## 197 0.10322997
## 198 0.09469875
## 199 0.09425805
## 200 0.09368251
## 201 0.08618140
## 202 0.08053410
## 203 0.07874315
## 204 0.06772311
## 205 0.06761376
## 206 0.05607018
## 207 0.04046362
## 208 0.03585420
## 209 0.03344539
## 210 0.02662688
## 211 0.02201029
## 212 0.02018323
## 213 0.02001709
## 214 0.01892805
## 215 0.00794430
## 216 0.00000000
## 217 0.00000000
## 218 0.00000000
## 219 0.00000000
## 220 0.00000000
## 221 0.00000000
## 222 0.00000000
## 223 0.00000000
## 224 0.00000000
## 225 0.00000000
## 226 0.00000000
## 227 0.00000000
## 228 0.00000000
## 229 0.00000000
## 230 0.00000000
## 231 0.00000000
## 232 0.00000000
## 233 0.00000000
## 234 0.00000000
## 235 0.00000000
## 236 0.00000000
## 237 0.00000000
## 238 0.00000000
## 239 0.00000000
## 240 0.00000000
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.1.2.2 Model Diagnose & Improve

9.1.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia 
##      221       94
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia 
## 0.7015873 0.2984127
table(trainData$DX)
## 
##       CN Dementia 
##      155       66
prop.table(table(trainData$DX))
## 
##        CN  Dementia 
## 0.7013575 0.2986425
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 2.351064
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 2.348485
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 51.203, df = 1, p-value = 8.328e-13
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 35.842, df = 1, p-value = 2.14e-09
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia 
##      155      132
dim(balanced_data_LGR_1)
## [1] 287 251
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       63        5
##   Dementia  3       23
##                                           
##                Accuracy : 0.9149          
##                  95% CI : (0.8392, 0.9625)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 5.403e-07       
##                                           
##                   Kappa : 0.7923          
##                                           
##  Mcnemar's Test P-Value : 0.7237          
##                                           
##             Sensitivity : 0.9545          
##             Specificity : 0.8214          
##          Pos Pred Value : 0.9265          
##          Neg Pred Value : 0.8846          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6702          
##    Detection Prevalence : 0.7234          
##       Balanced Accuracy : 0.8880          
##                                           
##        'Positive' Class : CN              
## 
print(model_LRM2)
## glmnet 
## 
## 287 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 230, 230, 229, 230, 229 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0002423556  0.9337568  0.8665565
##   0.10   0.0024235561  0.9337568  0.8665565
##   0.10   0.0242355605  0.9337568  0.8665565
##   0.55   0.0002423556  0.8747126  0.7495333
##   0.55   0.0024235561  0.8747126  0.7495333
##   0.55   0.0242355605  0.8433757  0.6875506
##   1.00   0.0002423556  0.8153660  0.6333284
##   1.00   0.0024235561  0.8188143  0.6396592
##   1.00   0.0242355605  0.7877193  0.5766047
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.02423556.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8684412
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC2          45.04
## cg02872767   33.08
## cg11787167   33.03
## cg09216282   31.82
## cg19503462   30.96
## cg06378561   29.09
## cg12108278   29.00
## cg03084184   28.27
## cg01680303   28.18
## cg01013522   27.48
## cg07152869   27.37
## cg12080266   27.35
## cg12858518   26.84
## cg02356645   26.09
## cg26739327   26.08
## cg03982462   24.73
## cg23432430   24.67
## cg06864789   24.04
## cg04109990   22.66
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##         Overall
## 1   3.896403496
## 2   1.754908564
## 3   1.288913073
## 4   1.287160892
## 5   1.239768042
## 6   1.206468623
## 7   1.133343390
## 8   1.129909798
## 9   1.101691626
## 10  1.097913397
## 11  1.070638685
## 12  1.066508382
## 13  1.065703093
## 14  1.045882431
## 15  1.016458375
## 16  1.016109536
## 17  0.963399104
## 18  0.961400970
## 19  0.936752532
## 20  0.882835903
## 21  0.878146450
## 22  0.873266179
## 23  0.848110895
## 24  0.847134908
## 25  0.832162610
## 26  0.828896065
## 27  0.812263407
## 28  0.810671849
## 29  0.806531822
## 30  0.803109792
## 31  0.799856085
## 32  0.799228744
## 33  0.798901576
## 34  0.759177334
## 35  0.741612130
## 36  0.732935626
## 37  0.729109427
## 38  0.711535139
## 39  0.709106308
## 40  0.700382423
## 41  0.691699544
## 42  0.690043705
## 43  0.674610394
## 44  0.669125397
## 45  0.663778460
## 46  0.663162257
## 47  0.658210134
## 48  0.654109906
## 49  0.648163129
## 50  0.644795104
## 51  0.625058487
## 52  0.623433493
## 53  0.620824154
## 54  0.618147045
## 55  0.616449154
## 56  0.612417030
## 57  0.608094410
## 58  0.605484614
## 59  0.600032989
## 60  0.590415992
## 61  0.585591554
## 62  0.584802249
## 63  0.582828377
## 64  0.582652021
## 65  0.577028084
## 66  0.572360926
## 67  0.568271261
## 68  0.566560873
## 69  0.556999735
## 70  0.554583274
## 71  0.545670629
## 72  0.531451947
## 73  0.528265726
## 74  0.525857992
## 75  0.522839793
## 76  0.504833183
## 77  0.497734929
## 78  0.497487834
## 79  0.497395205
## 80  0.493320816
## 81  0.492144875
## 82  0.491984046
## 83  0.489858600
## 84  0.481653201
## 85  0.479341229
## 86  0.476132589
## 87  0.474291426
## 88  0.471845333
## 89  0.471087518
## 90  0.461952953
## 91  0.454275665
## 92  0.450162267
## 93  0.443682397
## 94  0.441552569
## 95  0.435059009
## 96  0.433610608
## 97  0.428852441
## 98  0.423029256
## 99  0.419966886
## 100 0.416909890
## 101 0.415600062
## 102 0.414572832
## 103 0.413867906
## 104 0.413546298
## 105 0.407881189
## 106 0.405094355
## 107 0.403907118
## 108 0.400540545
## 109 0.400350010
## 110 0.395161028
## 111 0.393435036
## 112 0.389919398
## 113 0.387908213
## 114 0.379285215
## 115 0.369438892
## 116 0.367699167
## 117 0.364876171
## 118 0.362015181
## 119 0.361824711
## 120 0.357531042
## 121 0.357306792
## 122 0.353058454
## 123 0.337728023
## 124 0.337031391
## 125 0.336962185
## 126 0.336936603
## 127 0.335450718
## 128 0.335168325
## 129 0.332197952
## 130 0.329194390
## 131 0.326608604
## 132 0.319656980
## 133 0.312441820
## 134 0.309795580
## 135 0.305627995
## 136 0.303847618
## 137 0.303602475
## 138 0.297345411
## 139 0.294874386
## 140 0.292686781
## 141 0.284006018
## 142 0.281905017
## 143 0.281582947
## 144 0.275667395
## 145 0.275199450
## 146 0.272744065
## 147 0.268005435
## 148 0.263492210
## 149 0.263349295
## 150 0.262907121
## 151 0.258741586
## 152 0.256998965
## 153 0.255909971
## 154 0.255594125
## 155 0.255186143
## 156 0.249962688
## 157 0.249078740
## 158 0.248326478
## 159 0.232784521
## 160 0.225082506
## 161 0.218155053
## 162 0.217667572
## 163 0.212306129
## 164 0.211627374
## 165 0.195764766
## 166 0.192753127
## 167 0.192733789
## 168 0.190242767
## 169 0.188631039
## 170 0.178894734
## 171 0.177009523
## 172 0.170768425
## 173 0.169100099
## 174 0.168597199
## 175 0.166432458
## 176 0.161711629
## 177 0.161670364
## 178 0.161491611
## 179 0.154346025
## 180 0.145414935
## 181 0.144500345
## 182 0.141148522
## 183 0.129284701
## 184 0.125989107
## 185 0.124843859
## 186 0.124313927
## 187 0.119996553
## 188 0.115455692
## 189 0.113508961
## 190 0.110508178
## 191 0.109557595
## 192 0.092901796
## 193 0.092379051
## 194 0.084570197
## 195 0.080127469
## 196 0.075871039
## 197 0.069703746
## 198 0.068414340
## 199 0.068381720
## 200 0.065863321
## 201 0.061807694
## 202 0.059091054
## 203 0.055962061
## 204 0.055741828
## 205 0.047635267
## 206 0.040841863
## 207 0.038498925
## 208 0.030143392
## 209 0.025623720
## 210 0.025500178
## 211 0.024615332
## 212 0.017941921
## 213 0.017368721
## 214 0.008541186
## 215 0.004541769
## 216 0.004121476
## 217 0.000000000
## 218 0.000000000
## 219 0.000000000
## 220 0.000000000
## 221 0.000000000
## 222 0.000000000
## 223 0.000000000
## 224 0.000000000
## 225 0.000000000
## 226 0.000000000
## 227 0.000000000
## 228 0.000000000
## 229 0.000000000
## 230 0.000000000
## 231 0.000000000
## 232 0.000000000
## 233 0.000000000
## 234 0.000000000
## 235 0.000000000
## 236 0.000000000
## 237 0.000000000
## 238 0.000000000
## 239 0.000000000
## 240 0.000000000
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))
  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9735
## [1] "The auc value is:"
## Area under the curve: 0.9735

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.1.3. Elastic Net

9.1.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa      
##   0      0.00100000  0.7872727   0.35760190
##   0      0.05357895  0.7872727   0.35760190
##   0      0.10615789  0.7872727   0.35760190
##   0      0.15873684  0.7872727   0.35760190
##   0      0.21131579  0.7872727   0.35760190
##   0      0.26389474  0.7872727   0.35760190
##   0      0.31647368  0.7872727   0.35760190
##   0      0.36905263  0.7872727   0.35760190
##   0      0.42163158  0.7872727   0.35760190
##   0      0.47421053  0.7872727   0.35760190
##   0      0.52678947  0.7872727   0.35760190
##   0      0.57936842  0.7872727   0.35760190
##   0      0.63194737  0.7872727   0.35760190
##   0      0.68452632  0.7872727   0.35760190
##   0      0.73710526  0.7872727   0.35760190
##   0      0.78968421  0.7872727   0.35760190
##   0      0.84226316  0.7872727   0.35760190
##   0      0.89484211  0.7872727   0.35760190
##   0      0.94742105  0.7872727   0.35760190
##   0      1.00000000  0.7872727   0.35760190
##   1      0.00100000  0.7243434   0.32524015
##   1      0.05357895  0.6561616  -0.07066799
##   1      0.10615789  0.7014141   0.00000000
##   1      0.15873684  0.7014141   0.00000000
##   1      0.21131579  0.7014141   0.00000000
##   1      0.26389474  0.7014141   0.00000000
##   1      0.31647368  0.7014141   0.00000000
##   1      0.36905263  0.7014141   0.00000000
##   1      0.42163158  0.7014141   0.00000000
##   1      0.47421053  0.7014141   0.00000000
##   1      0.52678947  0.7014141   0.00000000
##   1      0.57936842  0.7014141   0.00000000
##   1      0.63194737  0.7014141   0.00000000
##   1      0.68452632  0.7014141   0.00000000
##   1      0.73710526  0.7014141   0.00000000
##   1      0.78968421  0.7014141   0.00000000
##   1      0.84226316  0.7014141   0.00000000
##   1      0.89484211  0.7014141   0.00000000
##   1      0.94742105  0.7014141   0.00000000
##   1      1.00000000  0.7014141   0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 1.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.7437854
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.927601809954751"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.9276018
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       13
##   Dementia  0       15
##                                           
##                Accuracy : 0.8617          
##                  95% CI : (0.7751, 0.9243)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.0002470       
##                                           
##                   Kappa : 0.6184          
##                                           
##  Mcnemar's Test P-Value : 0.0008741       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5357          
##          Pos Pred Value : 0.8354          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.8404          
##       Balanced Accuracy : 0.7679          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
##  Accuracy 
## 0.8617021
print(cm_FeatEval_Mean_ENM1_Kappa)
##     Kappa 
## 0.6183635
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC3          61.36
## PC2          53.66
## cg07152869   42.56
## cg19503462   41.32
## cg09216282   40.21
## cg04109990   37.13
## cg01013522   36.95
## cg02872767   36.43
## cg26739327   35.90
## cg26757229   35.79
## cg12858518   35.48
## cg11787167   35.17
## cg03982462   34.52
## cg06864789   34.39
## cg02356645   34.11
## cg00322003   34.05
## cg15775217   33.67
## cg04124201   33.28
## cg12306781   33.24
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   0.583845082
## 2   0.358663612
## 3   0.313744529
## 4   0.249081661
## 5   0.241847065
## 6   0.235359003
## 7   0.217423847
## 8   0.216380402
## 9   0.213347330
## 10  0.210242939
## 11  0.209609162
## 12  0.207817912
## 13  0.205991389
## 14  0.202211767
## 15  0.201481930
## 16  0.199851504
## 17  0.199483912
## 18  0.197286498
## 19  0.195000994
## 20  0.194780346
## 21  0.193392775
## 22  0.191005608
## 23  0.184404050
## 24  0.184309973
## 25  0.183936615
## 26  0.183446566
## 27  0.182692881
## 28  0.181994616
## 29  0.180035968
## 30  0.176894502
## 31  0.176356480
## 32  0.174846367
## 33  0.174767917
## 34  0.171389594
## 35  0.171274460
## 36  0.168957144
## 37  0.167655374
## 38  0.165165567
## 39  0.165137609
## 40  0.163219502
## 41  0.162891940
## 42  0.161066107
## 43  0.157656658
## 44  0.157320053
## 45  0.155935190
## 46  0.154389569
## 47  0.153822373
## 48  0.153178157
## 49  0.153129265
## 50  0.152570497
## 51  0.152522648
## 52  0.151897491
## 53  0.149873637
## 54  0.148924045
## 55  0.148386338
## 56  0.148356966
## 57  0.146425930
## 58  0.146311016
## 59  0.145938416
## 60  0.145597874
## 61  0.143990100
## 62  0.143752628
## 63  0.143155278
## 64  0.141542651
## 65  0.141400944
## 66  0.141332614
## 67  0.141005648
## 68  0.137381820
## 69  0.137114562
## 70  0.136930060
## 71  0.136585785
## 72  0.134761636
## 73  0.134457700
## 74  0.133895498
## 75  0.132428702
## 76  0.131942587
## 77  0.129697658
## 78  0.129390135
## 79  0.128897323
## 80  0.128750762
## 81  0.128655581
## 82  0.127506112
## 83  0.127238036
## 84  0.126754787
## 85  0.126074762
## 86  0.125415724
## 87  0.124879753
## 88  0.124275051
## 89  0.123697373
## 90  0.122888180
## 91  0.122043841
## 92  0.121296380
## 93  0.121224658
## 94  0.120212503
## 95  0.119931922
## 96  0.119687605
## 97  0.118922375
## 98  0.118559820
## 99  0.117714676
## 100 0.116619129
## 101 0.116604634
## 102 0.116559316
## 103 0.115500426
## 104 0.113811412
## 105 0.113737477
## 106 0.113310329
## 107 0.113005948
## 108 0.112614028
## 109 0.112020794
## 110 0.110593730
## 111 0.109643036
## 112 0.109242871
## 113 0.109132504
## 114 0.109019736
## 115 0.108239123
## 116 0.108057481
## 117 0.107684353
## 118 0.105071786
## 119 0.104563899
## 120 0.104361299
## 121 0.104268828
## 122 0.104224543
## 123 0.103770519
## 124 0.103651483
## 125 0.102772081
## 126 0.102706217
## 127 0.102591896
## 128 0.102434397
## 129 0.102369391
## 130 0.101953700
## 131 0.101131389
## 132 0.098989445
## 133 0.098733808
## 134 0.097502221
## 135 0.097209415
## 136 0.097032900
## 137 0.096893189
## 138 0.096562236
## 139 0.096551040
## 140 0.095532385
## 141 0.095434687
## 142 0.095416336
## 143 0.095143743
## 144 0.094821616
## 145 0.094376627
## 146 0.094312761
## 147 0.093907967
## 148 0.093813767
## 149 0.093318962
## 150 0.092339285
## 151 0.092064364
## 152 0.091495580
## 153 0.091073715
## 154 0.090813518
## 155 0.090171520
## 156 0.089722369
## 157 0.089298092
## 158 0.088604670
## 159 0.088424043
## 160 0.088257922
## 161 0.088127230
## 162 0.088100782
## 163 0.087837652
## 164 0.087437102
## 165 0.086194687
## 166 0.085114865
## 167 0.085062473
## 168 0.084128377
## 169 0.083022637
## 170 0.082944323
## 171 0.082649009
## 172 0.082561252
## 173 0.082450822
## 174 0.082268863
## 175 0.081875928
## 176 0.081337700
## 177 0.081258300
## 178 0.080958908
## 179 0.079729046
## 180 0.079524897
## 181 0.079472126
## 182 0.079146832
## 183 0.079114997
## 184 0.077175645
## 185 0.076948369
## 186 0.076808043
## 187 0.075605941
## 188 0.074547361
## 189 0.074398210
## 190 0.073111178
## 191 0.072380853
## 192 0.072038844
## 193 0.071685480
## 194 0.071315677
## 195 0.070448093
## 196 0.070030855
## 197 0.069781073
## 198 0.069514475
## 199 0.068986606
## 200 0.068020094
## 201 0.068019857
## 202 0.067653889
## 203 0.067432658
## 204 0.067308693
## 205 0.066432769
## 206 0.065908202
## 207 0.065643969
## 208 0.064698810
## 209 0.064269298
## 210 0.063669757
## 211 0.062648742
## 212 0.061412438
## 213 0.061371618
## 214 0.059722758
## 215 0.058056431
## 216 0.057983603
## 217 0.057205965
## 218 0.056978923
## 219 0.056973436
## 220 0.056453357
## 221 0.056403820
## 222 0.055086745
## 223 0.054879567
## 224 0.054227068
## 225 0.053826596
## 226 0.053205131
## 227 0.053194055
## 228 0.050599669
## 229 0.050506744
## 230 0.048032463
## 231 0.047742437
## 232 0.047074565
## 233 0.046079273
## 234 0.045822167
## 235 0.044418207
## 236 0.043892295
## 237 0.042759620
## 238 0.037601715
## 239 0.031227914
## 240 0.028950797
## 241 0.021657779
## 242 0.019854030
## 243 0.015806508
## 244 0.013947155
## 245 0.012595065
## 246 0.011300388
## 247 0.008561576
## 248 0.007203081
## 249 0.003546909
## 250 0.001027120
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.9886

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_ENM1_AUC <- mean_auc
}

9.1.4. XGBoost

9.1.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa     
##   0.3  1          0.6               0.50        50      0.6923232  0.19393378
##   0.3  1          0.6               0.50       100      0.7194949  0.27660286
##   0.3  1          0.6               0.50       150      0.7464646  0.32981320
##   0.3  1          0.6               0.75        50      0.7014141  0.16030418
##   0.3  1          0.6               0.75       100      0.7375758  0.28035474
##   0.3  1          0.6               0.75       150      0.7329293  0.26608172
##   0.3  1          0.6               1.00        50      0.7105051  0.18801244
##   0.3  1          0.6               1.00       100      0.6966667  0.16599624
##   0.3  1          0.6               1.00       150      0.7058586  0.19965314
##   0.3  1          0.8               0.50        50      0.7194949  0.24788786
##   0.3  1          0.8               0.50       100      0.7149495  0.23689636
##   0.3  1          0.8               0.50       150      0.7375758  0.30198312
##   0.3  1          0.8               0.75        50      0.6965657  0.17108403
##   0.3  1          0.8               0.75       100      0.7057576  0.18981931
##   0.3  1          0.8               0.75       150      0.7238384  0.25311208
##   0.3  1          0.8               1.00        50      0.6920202  0.12708769
##   0.3  1          0.8               1.00       100      0.6877778  0.14146498
##   0.3  1          0.8               1.00       150      0.7103030  0.20658063
##   0.3  2          0.6               0.50        50      0.6969697  0.15605074
##   0.3  2          0.6               0.50       100      0.7241414  0.24060600
##   0.3  2          0.6               0.50       150      0.7241414  0.24185795
##   0.3  2          0.6               0.75        50      0.6968687  0.14997724
##   0.3  2          0.6               0.75       100      0.7374747  0.27211504
##   0.3  2          0.6               0.75       150      0.7103030  0.20414917
##   0.3  2          0.6               1.00        50      0.7103030  0.16844914
##   0.3  2          0.6               1.00       100      0.7147475  0.17909555
##   0.3  2          0.6               1.00       150      0.7238384  0.21176120
##   0.3  2          0.8               0.50        50      0.7013131  0.18551375
##   0.3  2          0.8               0.50       100      0.7285859  0.26811703
##   0.3  2          0.8               0.50       150      0.7285859  0.27458915
##   0.3  2          0.8               0.75        50      0.6968687  0.15200467
##   0.3  2          0.8               0.75       100      0.7015152  0.18593286
##   0.3  2          0.8               0.75       150      0.7150505  0.22040761
##   0.3  2          0.8               1.00        50      0.6924242  0.13579366
##   0.3  2          0.8               1.00       100      0.7285859  0.22756327
##   0.3  2          0.8               1.00       150      0.7060606  0.17614523
##   0.3  3          0.6               0.50        50      0.7195960  0.19085058
##   0.3  3          0.6               0.50       100      0.7377778  0.24775314
##   0.3  3          0.6               0.50       150      0.7377778  0.24797092
##   0.3  3          0.6               0.75        50      0.7151515  0.19611838
##   0.3  3          0.6               0.75       100      0.7375758  0.26503982
##   0.3  3          0.6               0.75       150      0.7375758  0.26503982
##   0.3  3          0.6               1.00        50      0.7150505  0.16272734
##   0.3  3          0.6               1.00       100      0.7013131  0.13844463
##   0.3  3          0.6               1.00       150      0.6922222  0.12171266
##   0.3  3          0.8               0.50        50      0.6921212  0.11924936
##   0.3  3          0.8               0.50       100      0.7011111  0.14037121
##   0.3  3          0.8               0.50       150      0.7101010  0.18470813
##   0.3  3          0.8               0.75        50      0.6876768  0.12272053
##   0.3  3          0.8               0.75       100      0.7056566  0.16337281
##   0.3  3          0.8               0.75       150      0.7103030  0.17913642
##   0.3  3          0.8               1.00        50      0.7013131  0.14331140
##   0.3  3          0.8               1.00       100      0.7058586  0.17115365
##   0.3  3          0.8               1.00       150      0.6967677  0.15310033
##   0.4  1          0.6               0.50        50      0.7283838  0.31454989
##   0.4  1          0.6               0.50       100      0.7373737  0.34160799
##   0.4  1          0.6               0.50       150      0.7373737  0.35245591
##   0.4  1          0.6               0.75        50      0.6877778  0.16093451
##   0.4  1          0.6               0.75       100      0.7240404  0.25863887
##   0.4  1          0.6               0.75       150      0.7240404  0.25970810
##   0.4  1          0.6               1.00        50      0.7149495  0.21190020
##   0.4  1          0.6               1.00       100      0.7421212  0.28384150
##   0.4  1          0.6               1.00       150      0.7466667  0.30533795
##   0.4  1          0.8               0.50        50      0.7059596  0.21761065
##   0.4  1          0.8               0.50       100      0.7331313  0.29563287
##   0.4  1          0.8               0.50       150      0.7285859  0.28140726
##   0.4  1          0.8               0.75        50      0.6878788  0.14473168
##   0.4  1          0.8               0.75       100      0.7057576  0.18675468
##   0.4  1          0.8               0.75       150      0.7330303  0.26350458
##   0.4  1          0.8               1.00        50      0.7058586  0.17433151
##   0.4  1          0.8               1.00       100      0.7149495  0.23073300
##   0.4  1          0.8               1.00       150      0.7467677  0.29726976
##   0.4  2          0.6               0.50        50      0.6875758  0.18511508
##   0.4  2          0.6               0.50       100      0.7054545  0.21965669
##   0.4  2          0.6               0.50       150      0.7281818  0.28776332
##   0.4  2          0.6               0.75        50      0.6966667  0.15816905
##   0.4  2          0.6               0.75       100      0.7330303  0.26319107
##   0.4  2          0.6               0.75       150      0.7375758  0.27891082
##   0.4  2          0.6               1.00        50      0.6697980  0.09472114
##   0.4  2          0.6               1.00       100      0.6879798  0.12980757
##   0.4  2          0.6               1.00       150      0.6879798  0.12980757
##   0.4  2          0.8               0.50        50      0.7192929  0.22003564
##   0.4  2          0.8               0.50       100      0.7421212  0.28969840
##   0.4  2          0.8               0.50       150      0.7330303  0.25342143
##   0.4  2          0.8               0.75        50      0.7194949  0.18754998
##   0.4  2          0.8               0.75       100      0.7422222  0.24375524
##   0.4  2          0.8               0.75       150      0.7422222  0.24375524
##   0.4  2          0.8               1.00        50      0.7016162  0.14902128
##   0.4  2          0.8               1.00       100      0.7195960  0.21108772
##   0.4  2          0.8               1.00       150      0.7195960  0.21108772
##   0.4  3          0.6               0.50        50      0.7149495  0.25150965
##   0.4  3          0.6               0.50       100      0.7058586  0.21962839
##   0.4  3          0.6               0.50       150      0.7149495  0.23471328
##   0.4  3          0.6               0.75        50      0.7194949  0.18554314
##   0.4  3          0.6               0.75       100      0.7331313  0.21969450
##   0.4  3          0.6               0.75       150      0.7241414  0.19159266
##   0.4  3          0.6               1.00        50      0.7150505  0.21279375
##   0.4  3          0.6               1.00       100      0.7195960  0.22528572
##   0.4  3          0.6               1.00       150      0.7195960  0.22528572
##   0.4  3          0.8               0.50        50      0.7149495  0.21212825
##   0.4  3          0.8               0.50       100      0.7239394  0.23509761
##   0.4  3          0.8               0.50       150      0.7330303  0.26142928
##   0.4  3          0.8               0.75        50      0.7195960  0.23035611
##   0.4  3          0.8               0.75       100      0.7286869  0.24761207
##   0.4  3          0.8               0.75       150      0.7286869  0.25346814
##   0.4  3          0.8               1.00        50      0.6920202  0.12469183
##   0.4  3          0.8               1.00       100      0.6965657  0.13573829
##   0.4  3          0.8               1.00       150      0.6965657  0.13573829
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 1, eta = 0.4, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 1.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.7153068
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.7153068
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       62       17
##   Dementia  4       11
##                                          
##                Accuracy : 0.7766         
##                  95% CI : (0.679, 0.8561)
##     No Information Rate : 0.7021         
##     P-Value [Acc > NIR] : 0.068540       
##                                          
##                   Kappa : 0.3835         
##                                          
##  Mcnemar's Test P-Value : 0.008829       
##                                          
##             Sensitivity : 0.9394         
##             Specificity : 0.3929         
##          Pos Pred Value : 0.7848         
##          Neg Pred Value : 0.7333         
##              Prevalence : 0.7021         
##          Detection Rate : 0.6596         
##    Detection Prevalence : 0.8404         
##       Balanced Accuracy : 0.6661         
##                                          
##        'Positive' Class : CN             
## 
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]

print(cm_FeatEval_Mean_xgb_Accuracy)
##  Accuracy 
## 0.7765957
print(cm_FeatEval_Mean_xgb_Kappa)
##     Kappa 
## 0.3835103
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## cg06864789  100.00
## cg24861747   85.53
## cg16390578   84.50
## cg15775217   82.44
## cg26948066   81.14
## cg22901347   80.74
## cg13885788   79.86
## cg25561557   79.68
## cg25174111   70.21
## cg02095601   70.09
## cg26739327   60.10
## cg27114706   59.22
## cg04124201   55.87
## cg24859648   53.89
## cg00999469   51.61
## cg03084184   51.60
## cg05749243   51.50
## cg26757229   51.42
## cg10058204   49.97
## PC1          49.55
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##        Feature         Gain       Cover   Frequency   Importance
##         <char>        <num>       <num>       <num>        <num>
##  1: cg06864789 0.0399896338 0.024382932 0.013333333 0.0399896338
##  2: cg24861747 0.0342035727 0.027589597 0.026666667 0.0342035727
##  3: cg16390578 0.0337906307 0.027326138 0.026666667 0.0337906307
##  4: cg15775217 0.0329670227 0.019141861 0.013333333 0.0329670227
##  5: cg26948066 0.0324495325 0.018523981 0.013333333 0.0324495325
##  6: cg22901347 0.0322887947 0.025370975 0.020000000 0.0322887947
##  7: cg13885788 0.0319358780 0.019851728 0.013333333 0.0319358780
##  8: cg25561557 0.0318625184 0.021141091 0.013333333 0.0318625184
##  9: cg25174111 0.0280786013 0.014055708 0.006666667 0.0280786013
## 10: cg02095601 0.0280268066 0.028643603 0.026666667 0.0280268066
## 11: cg26739327 0.0240347054 0.024483239 0.020000000 0.0240347054
## 12: cg27114706 0.0236822007 0.013480351 0.006666667 0.0236822007
## 13: cg04124201 0.0223402759 0.013286931 0.006666667 0.0223402759
## 14: cg24859648 0.0215504758 0.015311243 0.013333333 0.0215504758
## 15: cg00999469 0.0206368384 0.012849563 0.006666667 0.0206368384
## 16: cg03084184 0.0206338341 0.020872011 0.020000000 0.0206338341
## 17: cg05749243 0.0205950108 0.012978782 0.006666667 0.0205950108
## 18: cg26757229 0.0205629552 0.017975754 0.013333333 0.0205629552
## 19: cg10058204 0.0199810362 0.012698551 0.006666667 0.0199810362
## 20:        PC1 0.0198155958 0.025629749 0.026666667 0.0198155958
## 21: cg07152869 0.0186776417 0.017577411 0.013333333 0.0186776417
## 22: cg23698271 0.0186579909 0.016901081 0.013333333 0.0186579909
## 23: cg19503462 0.0174827390 0.015534180 0.013333333 0.0174827390
## 24: cg05096415 0.0170792588 0.019723195 0.020000000 0.0170792588
## 25: cg12080266 0.0168143996 0.017557463 0.020000000 0.0168143996
## 26: cg10701746 0.0161775247 0.016528967 0.013333333 0.0161775247
## 27: cg00421199 0.0160789336 0.016797090 0.013333333 0.0160789336
## 28: cg10542624 0.0151217678 0.017040380 0.020000000 0.0151217678
## 29: cg02356645 0.0141399224 0.013634731 0.013333333 0.0141399224
## 30: cg01013522 0.0138386065 0.014945471 0.013333333 0.0138386065
## 31: cg11314779 0.0132496329 0.013660804 0.013333333 0.0132496329
## 32: cg18339359 0.0129194314 0.017606792 0.020000000 0.0129194314
## 33: cg06378561 0.0124730481 0.019639317 0.026666667 0.0124730481
## 34: cg04109990 0.0121095145 0.014257689 0.013333333 0.0121095145
## 35: cg01680303 0.0119948239 0.013509283 0.013333333 0.0119948239
## 36: cg20913114 0.0112033450 0.010448780 0.006666667 0.0112033450
## 37: cg23350716 0.0109529532 0.012276923 0.013333333 0.0109529532
## 38: cg20218135 0.0109084045 0.016989693 0.020000000 0.0109084045
## 39: cg14252149 0.0096559131 0.015447900 0.020000000 0.0096559131
## 40: cg04218584 0.0094613374 0.009535872 0.006666667 0.0094613374
## 41: cg27452255 0.0087948012 0.011773018 0.013333333 0.0087948012
## 42: cg19512141 0.0085954278 0.014242020 0.020000000 0.0085954278
## 43: cg02872767 0.0084331425 0.009396523 0.006666667 0.0084331425
## 44: cg05841700 0.0083460194 0.011291900 0.013333333 0.0083460194
## 45: cg04242342 0.0082068607 0.011742106 0.013333333 0.0082068607
## 46: cg09584650 0.0080562843 0.014125148 0.020000000 0.0080562843
## 47: cg24851651 0.0078705647 0.011121638 0.013333333 0.0078705647
## 48: cg12858518 0.0077581506 0.009317559 0.006666667 0.0077581506
## 49: cg22274273 0.0074702543 0.012069762 0.013333333 0.0074702543
## 50: cg15399577 0.0072381233 0.008727658 0.006666667 0.0072381233
## 51: cg11358878 0.0072221286 0.012819511 0.020000000 0.0072221286
## 52: cg10844498 0.0069066888 0.012670825 0.020000000 0.0069066888
## 53: cg17002338 0.0068463348 0.009956591 0.013333333 0.0068463348
## 54: cg23916408 0.0067990136 0.011225618 0.013333333 0.0067990136
## 55: cg05321907 0.0056813706 0.007985303 0.006666667 0.0056813706
## 56: cg00977253 0.0048957652 0.007351797 0.006666667 0.0048957652
## 57: cg15700429 0.0048350854 0.007166752 0.006666667 0.0048350854
## 58: cg06697310 0.0048240607 0.007073905 0.006666667 0.0048240607
## 59: cg03982462 0.0043869612 0.007881175 0.013333333 0.0043869612
## 60: cg03172493 0.0041560181 0.006284846 0.006666667 0.0041560181
## 61: cg00648024 0.0041212426 0.006653304 0.006666667 0.0041212426
## 62: cg04867412 0.0040737939 0.008396069 0.013333333 0.0040737939
## 63: cg11787167 0.0038015937 0.007971394 0.013333333 0.0038015937
## 64: cg01130884 0.0037377434 0.006184354 0.006666667 0.0037377434
## 65: cg10507965 0.0032406400 0.006896931 0.013333333 0.0032406400
## 66: cg02389264 0.0028402356 0.005412608 0.006666667 0.0028402356
## 67: cg04798314 0.0027079534 0.005187031 0.006666667 0.0027079534
## 68: cg18037388 0.0023725805 0.004774707 0.006666667 0.0023725805
## 69: cg00675157 0.0022345678 0.004307275 0.006666667 0.0022345678
## 70: cg02627240 0.0020918070 0.004128251 0.006666667 0.0020918070
## 71: cg06870118 0.0018991215 0.004069883 0.006666667 0.0018991215
## 72: cg09650803 0.0017943950 0.003583033 0.006666667 0.0017943950
## 73: cg03640465 0.0017530752 0.003792699 0.006666667 0.0017530752
## 74: cg02932958 0.0015433730 0.003472917 0.006666667 0.0015433730
## 75: cg04831745 0.0014059088 0.003332586 0.006666667 0.0014059088
## 76: cg12279734 0.0012199520 0.002971032 0.006666667 0.0012199520
## 77: cg26007606 0.0011215072 0.002812926 0.006666667 0.0011215072
## 78: cg16089727 0.0008901670 0.002428507 0.006666667 0.0008901670
## 79: cg17329602 0.0007168980 0.002135510 0.006666667 0.0007168980
## 80: cg27187580 0.0006872803 0.002058519 0.006666667 0.0006872803
##        Feature         Gain       Cover   Frequency   Importance
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7944

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_xgb_AUC <- mean_auc
}
print(FeatEval_Mean_xgb_AUC)
## Area under the curve: 0.7944

9.1.5. Random Forest

9.1.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)


print(rf_model)
## Random Forest 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      
##     2   0.7014141  0.000000000
##   126   0.6968687  0.003911686
##   250   0.7014141  0.023684211
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.699899
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.699899
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       28
##   Dementia  0        0
##                                          
##                Accuracy : 0.7021         
##                  95% CI : (0.599, 0.7921)
##     No Information Rate : 0.7021         
##     P-Value [Acc > NIR] : 0.5508         
##                                          
##                   Kappa : 0              
##                                          
##  Mcnemar's Test P-Value : 3.352e-07      
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.0000         
##          Pos Pred Value : 0.7021         
##          Neg Pred Value :    NaN         
##              Prevalence : 0.7021         
##          Detection Rate : 0.7021         
##    Detection Prevalence : 1.0000         
##       Balanced Accuracy : 0.5000         
##                                          
##        'Positive' Class : CN             
## 
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
##  Accuracy 
## 0.7021277
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
## Kappa 
##     0
importance_rf_model <- varImp(rf_model)


print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Importance
## cg04124201     100.00
## cg10542624      95.55
## cg12776173      89.48
## cg17329602      89.36
## cg19503462      83.36
## cg22274273      83.18
## cg13885788      82.67
## cg26948066      79.92
## cg25758034      79.34
## cg02356645      78.94
## cg05841700      78.80
## cg25174111      78.72
## cg05130642      76.48
## cg15775217      76.24
## cg00156497      74.71
## cg14780448      74.64
## cg04073914      73.77
## cg07152869      73.54
## cg14924512      72.54
## cg00648024      72.04
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))


print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))


print(Ordered_importance_rf_final_model)
  
}
##               CN     Dementia
## 1    2.535044733  2.535044733
## 2    2.330999961  2.330999961
## 3    2.052691755  2.052691755
## 4    2.047123394  2.047123394
## 5    1.771923145  1.771923145
## 6    1.763926549  1.763926549
## 7    1.740302955  1.740302955
## 8    1.614625216  1.614625216
## 9    1.587932419  1.587932419
## 10   1.569558108  1.569558108
## 11   1.562903250  1.562903250
## 12   1.559432421  1.559432421
## 13   1.456580165  1.456580165
## 14   1.445729070  1.445729070
## 15   1.375342056  1.375342056
## 16   1.372140504  1.372140504
## 17   1.332537425  1.332537425
## 18   1.321855098  1.321855098
## 19   1.276151371  1.276151371
## 20   1.253358124  1.253358124
## 21   1.250404348  1.250404348
## 22   1.229554057  1.229554057
## 23   1.222790989  1.222790989
## 24   1.216882813  1.216882813
## 25   1.206422397  1.206422397
## 26   1.172231440  1.172231440
## 27   1.158371379  1.158371379
## 28   1.157575047  1.157575047
## 29   1.129592329  1.129592329
## 30   1.121979230  1.121979230
## 31   1.105981887  1.105981887
## 32   1.092309707  1.092309707
## 33   1.060788634  1.060788634
## 34   1.040683384  1.040683384
## 35   1.022055545  1.022055545
## 36   1.015288857  1.015288857
## 37   1.004305503  1.004305503
## 38   0.995641905  0.995641905
## 39   0.989900752  0.989900752
## 40   0.989596623  0.989596623
## 41   0.949461319  0.949461319
## 42   0.940299048  0.940299048
## 43   0.936003962  0.936003962
## 44   0.925555611  0.925555611
## 45   0.915534876  0.915534876
## 46   0.911353340  0.911353340
## 47   0.905228530  0.905228530
## 48   0.863032259  0.863032259
## 49   0.861156508  0.861156508
## 50   0.802965089  0.802965089
## 51   0.799230230  0.799230230
## 52   0.749394100  0.749394100
## 53   0.738798368  0.738798368
## 54   0.738550516  0.738550516
## 55   0.726655641  0.726655641
## 56   0.726093878  0.726093878
## 57   0.696237803  0.696237803
## 58   0.694061479  0.694061479
## 59   0.692581660  0.692581660
## 60   0.689936008  0.689936008
## 61   0.653693306  0.653693306
## 62   0.650724716  0.650724716
## 63   0.646814560  0.646814560
## 64   0.639735851  0.639735851
## 65   0.635365807  0.635365807
## 66   0.632249178  0.632249178
## 67   0.624949710  0.624949710
## 68   0.623953615  0.623953615
## 69   0.591068947  0.591068947
## 70   0.577197824  0.577197824
## 71   0.576712339  0.576712339
## 72   0.574169670  0.574169670
## 73   0.574088788  0.574088788
## 74   0.573318299  0.573318299
## 75   0.569315842  0.569315842
## 76   0.564539249  0.564539249
## 77   0.552622607  0.552622607
## 78   0.535082529  0.535082529
## 79   0.533369071  0.533369071
## 80   0.525076658  0.525076658
## 81   0.521503059  0.521503059
## 82   0.519406858  0.519406858
## 83   0.515074644  0.515074644
## 84   0.514261223  0.514261223
## 85   0.511788103  0.511788103
## 86   0.504366860  0.504366860
## 87   0.502197087  0.502197087
## 88   0.497966494  0.497966494
## 89   0.487666514  0.487666514
## 90   0.475781133  0.475781133
## 91   0.464914773  0.464914773
## 92   0.441243535  0.441243535
## 93   0.436925139  0.436925139
## 94   0.435140951  0.435140951
## 95   0.424280432  0.424280432
## 96   0.412711346  0.412711346
## 97   0.412131620  0.412131620
## 98   0.404948356  0.404948356
## 99   0.403835357  0.403835357
## 100  0.397476552  0.397476552
## 101  0.396226821  0.396226821
## 102  0.379468279  0.379468279
## 103  0.357092654  0.357092654
## 104  0.327446348  0.327446348
## 105  0.322194894  0.322194894
## 106  0.321573646  0.321573646
## 107  0.321041619  0.321041619
## 108  0.319880556  0.319880556
## 109  0.302281190  0.302281190
## 110  0.302234984  0.302234984
## 111  0.297533965  0.297533965
## 112  0.296462991  0.296462991
## 113  0.291289739  0.291289739
## 114  0.289104655  0.289104655
## 115  0.287303183  0.287303183
## 116  0.285443393  0.285443393
## 117  0.277739093  0.277739093
## 118  0.263106597  0.263106597
## 119  0.252402516  0.252402516
## 120  0.250876938  0.250876938
## 121  0.237542317  0.237542317
## 122  0.235758755  0.235758755
## 123  0.219342024  0.219342024
## 124  0.218469913  0.218469913
## 125  0.217873060  0.217873060
## 126  0.216187691  0.216187691
## 127  0.211320196  0.211320196
## 128  0.198988054  0.198988054
## 129  0.197131121  0.197131121
## 130  0.192431950  0.192431950
## 131  0.179611037  0.179611037
## 132  0.164300273  0.164300273
## 133  0.161371715  0.161371715
## 134  0.158046007  0.158046007
## 135  0.150360759  0.150360759
## 136  0.147566731  0.147566731
## 137  0.146570451  0.146570451
## 138  0.143239450  0.143239450
## 139  0.140224253  0.140224253
## 140  0.139979778  0.139979778
## 141  0.135180676  0.135180676
## 142  0.132448551  0.132448551
## 143  0.129871855  0.129871855
## 144  0.127331920  0.127331920
## 145  0.126282401  0.126282401
## 146  0.114005775  0.114005775
## 147  0.091017572  0.091017572
## 148  0.084394060  0.084394060
## 149  0.084116697  0.084116697
## 150  0.084116112  0.084116112
## 151  0.081932473  0.081932473
## 152  0.076956534  0.076956534
## 153  0.074205068  0.074205068
## 154  0.054213874  0.054213874
## 155  0.033494061  0.033494061
## 156  0.028715870  0.028715870
## 157  0.025190581  0.025190581
## 158  0.022212672  0.022212672
## 159  0.012289154  0.012289154
## 160  0.002670738  0.002670738
## 161 -0.002986122 -0.002986122
## 162 -0.004760209 -0.004760209
## 163 -0.007339879 -0.007339879
## 164 -0.008772912 -0.008772912
## 165 -0.013041981 -0.013041981
## 166 -0.015589914 -0.015589914
## 167 -0.017501106 -0.017501106
## 168 -0.020152771 -0.020152771
## 169 -0.025036845 -0.025036845
## 170 -0.027589681 -0.027589681
## 171 -0.039537743 -0.039537743
## 172 -0.041268717 -0.041268717
## 173 -0.041991533 -0.041991533
## 174 -0.042915784 -0.042915784
## 175 -0.074295024 -0.074295024
## 176 -0.078451855 -0.078451855
## 177 -0.081112487 -0.081112487
## 178 -0.088828805 -0.088828805
## 179 -0.090090864 -0.090090864
## 180 -0.090873047 -0.090873047
## 181 -0.110930746 -0.110930746
## 182 -0.156961508 -0.156961508
## 183 -0.157244592 -0.157244592
## 184 -0.177668313 -0.177668313
## 185 -0.178723712 -0.178723712
## 186 -0.182142640 -0.182142640
## 187 -0.182161962 -0.182161962
## 188 -0.191463517 -0.191463517
## 189 -0.195380091 -0.195380091
## 190 -0.197808425 -0.197808425
## 191 -0.214374621 -0.214374621
## 192 -0.215148116 -0.215148116
## 193 -0.234220938 -0.234220938
## 194 -0.244530117 -0.244530117
## 195 -0.250287952 -0.250287952
## 196 -0.267425379 -0.267425379
## 197 -0.272062605 -0.272062605
## 198 -0.299217105 -0.299217105
## 199 -0.305325316 -0.305325316
## 200 -0.318495269 -0.318495269
## 201 -0.320940611 -0.320940611
## 202 -0.329097224 -0.329097224
## 203 -0.333446265 -0.333446265
## 204 -0.369084365 -0.369084365
## 205 -0.371287977 -0.371287977
## 206 -0.376020692 -0.376020692
## 207 -0.406913597 -0.406913597
## 208 -0.410586966 -0.410586966
## 209 -0.416564267 -0.416564267
## 210 -0.417722762 -0.417722762
## 211 -0.432704035 -0.432704035
## 212 -0.433852093 -0.433852093
## 213 -0.485831051 -0.485831051
## 214 -0.500411161 -0.500411161
## 215 -0.501714954 -0.501714954
## 216 -0.507854235 -0.507854235
## 217 -0.512565467 -0.512565467
## 218 -0.524970784 -0.524970784
## 219 -0.541254424 -0.541254424
## 220 -0.564962919 -0.564962919
## 221 -0.579140179 -0.579140179
## 222 -0.586762825 -0.586762825
## 223 -0.593562559 -0.593562559
## 224 -0.609789214 -0.609789214
## 225 -0.615698539 -0.615698539
## 226 -0.621831343 -0.621831343
## 227 -0.648009491 -0.648009491
## 228 -0.663517161 -0.663517161
## 229 -0.701471899 -0.701471899
## 230 -0.705233924 -0.705233924
## 231 -0.721237608 -0.721237608
## 232 -0.806261941 -0.806261941
## 233 -0.806824859 -0.806824859
## 234 -0.821020965 -0.821020965
## 235 -0.832463569 -0.832463569
## 236 -0.842594622 -0.842594622
## 237 -0.869991796 -0.869991796
## 238 -0.881808096 -0.881808096
## 239 -0.893759057 -0.893759057
## 240 -0.914375459 -0.914375459
## 241 -1.108197002 -1.108197002
## 242 -1.141851786 -1.141851786
## 243 -1.153889852 -1.153889852
## 244 -1.164146643 -1.164146643
## 245 -1.172265418 -1.172265418
## 246 -1.487708199 -1.487708199
## 247 -1.588096155 -1.588096155
## 248 -1.774153715 -1.774153715
## 249 -1.777110894 -1.777110894
## 250 -2.049723254 -2.049723254
if(METHOD_FEATURE_FLAG==3 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))


print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.7938

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    FeatEval_Mean_rf_AUC <- mean_auc
}
print(FeatEval_Mean_rf_AUC)
## Area under the curve: 0.7938

9.1.6. SVM

9.1.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 176, 177, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.9232323  0.8274853
##   0.50  0.9231313  0.8265174
##   1.00  0.9277778  0.8365013
## 
## Tuning parameter 'sigma' was held constant at a value of 0.002033218
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002033218 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.002033218 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.9247138
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.9247138
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.995475113122172"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9954751
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       60        2
##   Dementia  6       26
##                                           
##                Accuracy : 0.9149          
##                  95% CI : (0.8392, 0.9625)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 5.403e-07       
##                                           
##                   Kappa : 0.8046          
##                                           
##  Mcnemar's Test P-Value : 0.2888          
##                                           
##             Sensitivity : 0.9091          
##             Specificity : 0.9286          
##          Pos Pred Value : 0.9677          
##          Neg Pred Value : 0.8125          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6383          
##    Detection Prevalence : 0.6596          
##       Balanced Accuracy : 0.9188          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
##  Accuracy 
## 0.9148936
print(cm_FeatEval_Mean_svm_Kappa)
##     Kappa 
## 0.8045738

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 315 rows and 251 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1        PC1      1.111111   1.222222      1.311111        0.03492063
## 2 cg25174111      1.000000   1.111111      1.111111        0.03174603
## 3 cg16390578      1.111111   1.111111      1.111111        0.03174603
## 4 cg03172493      1.111111   1.111111      1.111111        0.03174603
## 5        PC2      1.000000   1.111111      1.111111        0.03174603
## 6 cg03084184      1.111111   1.111111      1.222222        0.03174603
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (test_data_SVM1$DX Dementia) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9659
## [1] "The auc vlue is:"
## Area under the curve: 0.9659

if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_svm_AUC <- mean_auc
    
}
print(FeatEval_Mean_svm_AUC)
## Area under the curve: 0.9659

9.2 Selected Based on Median

9.2.1 Input Feature For Evaluation

Performance of the selected output features based on Median

processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature

AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 251
##   DX    cg24861747      PC1 cg01013522 cg04242342 cg06864789 cg23836570 cg25174111 cg02356645 cg04124201 cg00999469     PC2 cg23698271 cg14780448 cg15775217 cg26739327 cg12279734 cg22274273 cg16390578
##   <fct>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN         0.431 -0.173        0.886     0.817      0.461      0.543       0.857      0.583      0.331      0.286  0.0575      0.911     0.670       0.917     0.769       0.149     0.425      0.210 
## 2 CN         0.807 -0.00367      0.543     0.804      0.875      0.0327      0.257      0.570      0.324      0.250  0.0837      0.905     0.621       0.604     0.873       0.876     0.420      0.0639
## 3 Deme…      0.335 -0.187        0.843     0.829      0.490      0.599       0.190      0.568      0.433      0.282 -0.0112      0.880     0.0443      0.906     0.834       0.867     0.416      0.231 
## 4 CN         0.600 -0.0379       0.824     0.443      0.0542     0.573       0.205      0.919      0.307      0.297  0.0157      0.528     0.913       0.638     0.105       0.866     0.0230     0.213 
## 5 Deme…      0.773 -0.139        0.512     0.419      0.835      0.918       0.866      0.907      0.375      0.290  0.0299      0.910     0.911       0.570     0.757       0.610     0.413      0.245 
## 6 CN         0.731 -0.213        0.492     0.0282     0.374      0.502       0.203      0.895      0.373      0.927  0.0518      0.903     0.655       0.886     0.0827      0.600     0.0300     0.871 
## # ℹ 232 more variables: cg09650803 <dbl>, cg18339359 <dbl>, cg07152869 <dbl>, cg03172493 <dbl>, cg06697310 <dbl>, cg13885788 <dbl>, cg18037388 <dbl>, cg20507276 <dbl>, cg22901347 <dbl>,
## #   cg25561557 <dbl>, cg27114706 <dbl>, PC3 <dbl>, cg19503462 <dbl>, cg26983017 <dbl>, cg09216282 <dbl>, cg10542624 <dbl>, cg11787167 <dbl>, cg02095601 <dbl>, cg02078724 <dbl>, cg02872767 <dbl>,
## #   cg18662228 <dbl>, cg23916408 <dbl>, cg04109990 <dbl>, cg20218135 <dbl>, cg12858518 <dbl>, cg03982462 <dbl>, cg01680303 <dbl>, cg03084184 <dbl>, cg06870118 <dbl>, cg12306781 <dbl>,
## #   cg00322003 <dbl>, cg12471283 <dbl>, cg12080266 <dbl>, cg06378561 <dbl>, cg14252149 <dbl>, cg27452255 <dbl>, cg11358878 <dbl>, cg00421199 <dbl>, cg05749243 <dbl>, cg17118775 <dbl>,
## #   cg24859648 <dbl>, cg13799572 <dbl>, cg00977253 <dbl>, cg15591384 <dbl>, cg02932958 <dbl>, cg12108278 <dbl>, cg07584620 <dbl>, cg00841008 <dbl>, cg23432430 <dbl>, cg03392100 <dbl>,
## #   cg16715186 <dbl>, cg05096415 <dbl>, cg08584917 <dbl>, cg08242313 <dbl>, cg02389264 <dbl>, cg09584650 <dbl>, cg16268937 <dbl>, cg26948066 <dbl>, cg26757229 <dbl>, cg04218584 <dbl>,
## #   cg05373298 <dbl>, cg05841700 <dbl>, cg26474732 <dbl>, cg27286614 <dbl>, cg10701746 <dbl>, cg26901661 <dbl>, cg06624143 <dbl>, cg18821122 <dbl>, cg17044529 <dbl>, cg11173002 <dbl>, …
print(Selected_median_imp_Name)
##   [1] "cg24861747" "PC1"        "cg01013522" "cg04242342" "cg06864789" "cg23836570" "cg25174111" "cg02356645" "cg04124201" "cg00999469" "PC2"        "cg23698271" "cg14780448" "cg15775217" "cg26739327"
##  [16] "cg12279734" "cg22274273" "cg16390578" "cg09650803" "cg18339359" "cg07152869" "cg03172493" "cg06697310" "cg13885788" "cg18037388" "cg20507276" "cg22901347" "cg25561557" "cg27114706" "PC3"       
##  [31] "cg19503462" "cg26983017" "cg09216282" "cg10542624" "cg11787167" "cg02095601" "cg02078724" "cg02872767" "cg18662228" "cg23916408" "cg04109990" "cg20218135" "cg12858518" "cg03982462" "cg01680303"
##  [46] "cg03084184" "cg06870118" "cg12306781" "cg00322003" "cg12471283" "cg12080266" "cg06378561" "cg14252149" "cg27452255" "cg11358878" "cg00421199" "cg05749243" "cg17118775" "cg24859648" "cg13799572"
##  [61] "cg00977253" "cg15591384" "cg02932958" "cg12108278" "cg07584620" "cg00841008" "cg23432430" "cg03392100" "cg16715186" "cg05096415" "cg08584917" "cg08242313" "cg02389264" "cg09584650" "cg16268937"
##  [76] "cg26948066" "cg26757229" "cg04218584" "cg05373298" "cg05841700" "cg26474732" "cg27286614" "cg10701746" "cg26901661" "cg06624143" "cg18821122" "cg17044529" "cg11173002" "cg15399577" "cg16338321"
##  [91] "cg20913114" "cg03115532" "cg04831745" "cg19555075" "cg02901522" "cg24104387" "cg21501207" "cg12702014" "cg01280698" "cg15730644" "cg02217425" "cg14924512" "cg04798314" "cg11314779" "cg00675157"
## [106] "cg11247378" "cg12556569" "cg23161429" "cg05059349" "cg02494911" "cg24065597" "cg14904299" "cg19512141" "cg21533482" "cg16098618" "cg16858433" "cg17623720" "cg23350716" "cg12240569" "cg13226272"
## [121] "cg06536614" "cg04467639" "cg26007606" "cg06264882" "cg10666341" "cg03640465" "cg04970287" "cg11706829" "cg21578644" "cg17386240" "cg21986118" "cg02302183" "cg05321907" "cg14764203" "cg15700429"
## [136] "cg13080267" "cg11331837" "cg11834635" "cg17419220" "cg10058204" "cg24851651" "cg07971231" "cg10507965" "cg26889118" "cg22071943" "cg18526121" "cg07304760" "cg00648024" "cg17329602" "cg22653957"
## [151] "cg16361249" "cg05455372" "cg02495179" "cg05377703" "cg02656016" "cg11227702" "cg27187580" "cg10786572" "cg06875704" "cg02981548" "cg04577745" "cg12434901" "cg12421087" "cg11835797" "cg27224751"
## [166] "cg02627240" "cg11109139" "cg07456472" "cg09247979" "cg07138269" "cg01802772" "cg09518270" "cg17429539" "cg12776173" "cg26052728" "cg03628603" "cg15501526" "cg14465143" "cg01130884" "cg08397053"
## [181] "cg11716267" "cg12074150" "cg00051154" "cg18861767" "cg25758034" "cg21575308" "cg03327352" "cg03057303" "cg04073914" "cg04664583" "cg00156497" "cg17002338" "cg04845852" "cg12738248" "cg12466610"
## [196] "cg14609402" "cg01097733" "cg12012426" "cg04033559" "cg17811452" "cg16310958" "cg20300784" "cg02489327" "cg23813394" "cg00332268" "cg06012621" "cg23840008" "cg27341708" "cg20094343" "cg27577781"
## [211] "cg22681945" "cg03167407" "cg16089727" "cg02823329" "cg23947654" "cg04768387" "cg10844498" "cg03359067" "cg14170504" "cg17906851" "cg12333628" "cg12284872" "cg05351360" "cg19248407" "cg15535896"
## [226] "cg24422984" "cg18310072" "cg27639199" "cg26081710" "cg06032337" "cg04771146" "cg24638099" "cg18029737" "cg09993718" "cg04867412" "cg12689021" "cg20070588" "cg16020483" "cg14181112" "cg01608425"
## [241] "cg10829391" "cg13375589" "cg05161773" "cg21757617" "cg05125667" "cg10985055" "cg17348244" "cg12293347" "cg16733676" "cg05813498"
print(head(df_selected_Median))
##                           DX cg24861747          PC1 cg01013522 cg04242342 cg06864789 cg23836570 cg25174111 cg02356645 cg04124201 cg00999469         PC2 cg23698271 cg14780448 cg15775217 cg26739327
## 200223270003_R03C01       CN  0.4309505 -0.172761185  0.8862821  0.8167892  0.4605312 0.54259383  0.8573844  0.5833923  0.3308589  0.2857719  0.05745834  0.9109565 0.67021018  0.9168327  0.7693268
## 200223270003_R06C01       CN  0.8071462 -0.003667305  0.5425308  0.8040357  0.8751365 0.03267304  0.2567745  0.5701428  0.3241613  0.2499229  0.08372861  0.9051701 0.62073547  0.6042521  0.8727608
## 200223270003_R07C01 Dementia  0.3347317 -0.186779607  0.8429862  0.8286115  0.4902033 0.59939745  0.1903803  0.5683381  0.4332693  0.2819622 -0.01117250  0.8804362 0.04425741  0.9062231  0.8340445
##                     cg12279734 cg22274273 cg16390578 cg09650803 cg18339359 cg07152869 cg03172493 cg06697310 cg13885788 cg18037388 cg20507276  cg22901347 cg25561557 cg27114706          PC3 cg19503462
## 200223270003_R03C01  0.1494651  0.4246379 0.20983422  0.8954464  0.9040272   0.505063 0.63362492  0.8653044  0.9369476  0.7545086 0.38721972 0.001690332 0.03851635  0.9359259  0.005055871  0.4537684
## 200223270003_R06C01  0.8760759  0.4196796 0.06389068  0.9113477  0.8552121   0.835249 0.06148804  0.2405168  0.5163017  0.7294565 0.47978438 0.103413834 0.47259480  0.9285384  0.029143653  0.6997359
## 200223270003_R07C01  0.8674214  0.4164100 0.23101450  0.2518414  0.3073106   0.519430 0.64562298  0.8479193  0.9183376  0.2391659 0.02261996 0.632991482 0.43364249  0.4787397 -0.032302430  0.7189778
##                     cg26983017 cg09216282 cg10542624 cg11787167 cg02095601 cg02078724 cg02872767 cg18662228 cg23916408 cg04109990 cg20218135 cg12858518 cg03982462 cg01680303 cg03084184 cg06870118
## 200223270003_R03C01 0.03145466  0.9244259 0.02189577 0.04673831  0.9161259  0.2896133  0.3886537  0.8730153  0.9154993  0.6476604 0.64278153  0.9285252  0.6023731  0.1344941  0.7877128  0.8100144
## 200223270003_R06C01 0.84677625  0.9263996 0.54330620 0.32564508  0.2233062  0.2805612  0.9099575  0.8602464  0.8886255  0.6692040 0.06509247  0.9017533  0.8778458  0.7573869  0.4546397  0.7802055
## 200223270003_R07C01 0.53922255  0.9352308 0.54991492 0.43162543  0.8978191  0.2739571  0.8603283  0.8683578  0.8872447  0.9024920 0.65642359  0.9187879  0.8860227  0.4772204  0.7812413  0.7917257
##                     cg12306781 cg00322003 cg12471283 cg12080266 cg06378561 cg14252149 cg27452255 cg11358878 cg00421199 cg05749243 cg17118775 cg24859648 cg13799572 cg00977253 cg15591384 cg02932958
## 200223270003_R03C01  0.8663817  0.5702070  0.8658731  0.9450629  0.9377503 0.02450779  0.6593379 0.83252951  0.8532461  0.9209685  0.5585676 0.44392797  0.8449584  0.9145988  0.7870275  0.4210489
## 200223270003_R06C01  0.8027798  0.3077122  0.6963410  0.9363381  0.5154019 0.02382413  0.9012217 0.87521203  0.8891803  0.9143061  0.2916054 0.03341185  0.4409219  0.8944518  0.7429614  0.3825995
## 200223270003_R07C01  0.8787250  0.6104341  0.6680611  0.6398247  0.9403569 0.56212480  0.8898635 0.08917903  0.8937751  0.9121180  0.2868948 0.43582347  0.8516975  0.9150206  0.8346279  0.7617081
##                     cg12108278 cg07584620 cg00841008 cg23432430 cg03392100 cg16715186 cg05096415 cg08584917 cg08242313 cg02389264 cg09584650 cg16268937 cg26948066 cg26757229 cg04218584 cg05373298
## 200223270003_R03C01  0.9243869  0.3763980 0.61899333  0.9455418  0.9227394  0.7946153  0.5177819  0.9019732  0.8953645  0.7900942 0.09661586  0.8931712  0.5026045  0.1422661  0.8971263 0.02652391
## 200223270003_R06C01  0.9068995  0.8530961 0.05401588  0.9418716  0.8902340  0.8124316  0.6288426  0.9187789  0.8573493  0.7789974 0.52399749  0.9034556  0.9101976  0.7933794  0.8491768 0.83538124
## 200223270003_R07C01  0.9131367  0.3888623 0.90769205  0.9426559  0.4359657  0.7773263  0.6060271  0.6007449  0.8992114  0.4174463 0.11587211  0.8928450  0.9379543  0.8074830  0.9008137 0.89506024
##                     cg05841700 cg26474732 cg27286614 cg10701746 cg26901661 cg06624143 cg18821122 cg17044529 cg11173002 cg15399577 cg16338321 cg20913114 cg03115532 cg04831745 cg19555075 cg02901522
## 200223270003_R03C01  0.9146488  0.8184088  0.5933858  0.4868342  0.8754981  0.4899758  0.5901603  0.9117895  0.5913599  0.8785443  0.8294062 0.80382984  0.8659608 0.71214149  0.4921409  0.9372901
## 200223270003_R06C01  0.3737990  0.7358417  0.6348795  0.4927257  0.9021064  0.9107688  0.5779620  0.9290636  0.1878736  0.8703169  0.4918708 0.03158439  0.8533871 0.06871768  0.4261618  0.4954978
## 200223270003_R07C01  0.5046468  0.7509296  0.9468370  0.8552180  0.8556831  0.9217350  0.9251431  0.9402858  0.5150840  0.8968856  0.5245645 0.81256840  0.4416574 0.90994644  0.4694729  0.9381188
##                     cg24104387 cg21501207 cg12702014 cg01280698 cg15730644 cg02217425 cg14924512 cg04798314 cg11314779 cg00675157 cg11247378 cg12556569 cg23161429 cg05059349 cg02494911 cg24065597
## 200223270003_R03C01  0.5339034  0.6813712  0.7848681 0.88462009  0.4353906  0.1032503  0.9160885 0.07119798  0.8966100  0.9242325  0.7874849 0.03924599  0.9099619 0.04507417  0.2416332  0.2221098
## 200223270003_R06C01  0.3007614  0.4747229  0.8065993 0.88471320  0.8763048  0.6592850  0.9088414 0.09248843  0.8908661  0.9254708  0.4807942 0.48636893  0.8833895 0.03898752  0.2520909  0.7036129
## 200223270003_R07C01  0.7509780  0.7422003  0.7458594 0.06370005  0.4833709  0.8792021  0.9081681 0.06972566  0.9048316  0.5447244  0.4537348 0.46498877  0.9134709 0.85329923  0.2457032  0.2407676
##                     cg14904299 cg19512141 cg21533482 cg16098618 cg16858433 cg17623720 cg23350716 cg12240569 cg13226272 cg06536614 cg04467639 cg26007606 cg06264882 cg10666341 cg03640465 cg04970287
## 200223270003_R03C01  0.2712472  0.7903543  0.8288469  0.2571464  0.9194211  0.8988624  0.7876873 0.02690547  0.5410002  0.5746694  0.6400206  0.5615550 0.43678655  0.6731062  0.2531644  0.8875750
## 200223270003_R06C01  0.8364544  0.8404684  0.6766373  0.6899734  0.9271632  0.8172384  0.6960544 0.46030640  0.4437070  0.5773468  0.5657041  0.1463111 0.43703442  0.6443180  0.2904433  0.4651667
## 200223270003_R07C01  0.8193867  0.2202759  0.6235932  0.6488005  0.9288986  0.8226085  0.7387498 0.86185839  0.0265215  0.5848917  0.6302917  0.8101800 0.02439581  0.8970292  0.9024530  0.9092326
##                     cg11706829 cg21578644 cg17386240 cg21986118 cg02302183 cg05321907 cg14764203 cg15700429 cg13080267 cg11331837 cg11834635 cg17419220 cg10058204 cg24851651 cg07971231 cg10507965
## 200223270003_R03C01  0.5444785  0.9260863  0.7144809  0.6571296  0.9191148  0.1782629  0.4683709  0.9114530 0.78371483 0.57150125  0.8880887 0.43470227  0.5834496 0.05358297  0.8406145  0.4010973
## 200223270003_R06C01  0.5669449  0.9159726  0.8074824  0.7034445  0.8749250  0.8427929  0.8916566  0.8838233 0.09436069 0.03182862  0.2493491 0.02781411  0.0549494 0.05968923  0.8447914  0.4033691
## 200223270003_R07C01  0.8746449  0.9178001  0.7227918  0.9055894  0.8888247  0.8320504  0.8714472  0.9095363 0.09351259 0.03832164  0.2210428 0.42803809  0.5689591 0.60864179  0.8874706  0.3869543
##                     cg26889118 cg22071943 cg18526121 cg07304760 cg00648024 cg17329602 cg22653957 cg16361249 cg05455372 cg02495179 cg05377703 cg02656016 cg11227702 cg27187580 cg10786572 cg06875704
## 200223270003_R03C01  0.9154836  0.2442648  0.4762313  0.5798534 0.40202875  0.8189317  0.6442184 0.52843073  0.5532370  0.7373055  0.8213047  0.2355680 0.49184121  0.6643576  0.5982086  0.9181165
## 200223270003_R06C01  0.9101336  0.2644581  0.4833367  0.5575516 0.05579011  0.8478185  0.9531308 0.09039669  0.6375708  0.5588114  0.5152514  0.9052318 0.02543724  0.6914924  0.0935115  0.9200461
## 200223270003_R07C01  0.5759967  0.2599947  0.7761450  0.9195617 0.03708944  0.8596400  0.6534542 0.42039062  0.8095964  0.5273309  0.7773036  0.8653682 0.45150971  0.9357074  0.8436837  0.9048289
##                     cg02981548 cg04577745 cg12434901 cg12421087 cg11835797 cg27224751 cg02627240 cg11109139 cg07456472 cg09247979 cg07138269 cg01802772 cg09518270 cg17429539 cg12776173 cg26052728
## 200223270003_R03C01  0.5220037  0.2681033  0.8458468  0.5399655  0.9007408 0.03214912 0.57129408  0.6350109  0.5856904  0.5706177  0.9426707 0.02361869  0.8870663  0.7100923  0.8730635  0.1513937
## 200223270003_R06C01  0.5098965  0.8570624  0.8299579  0.5400348  0.8944957 0.83123722 0.05309659  0.6904482  0.3886482  0.5090215  0.5057781 0.02401520  0.8765622  0.7660838  0.7009491  0.5254754
## 200223270003_R07C01  0.5660985  0.9002276  0.8482994  0.5291975  0.8168544 0.79732117 0.52179136  0.6274025  0.9186405  0.5066661  0.9400527 0.02200957  0.8135001  0.6984969  0.1136716  0.5600724
##                     cg03628603 cg15501526 cg14465143 cg01130884 cg08397053 cg11716267 cg12074150 cg00051154 cg18861767 cg25758034 cg21575308 cg03327352 cg03057303 cg04073914 cg04664583 cg00156497
## 200223270003_R03C01  0.9157246  0.6319253  0.5543068  0.6230659 0.04199567 0.04959702 0.18602738 0.08370609  0.7847380  0.6649219 0.44702405  0.8786878  0.8923039 0.03089677  0.5881190  0.5194653
## 200223270003_R06C01  0.8851075  0.7435100  0.2702875  0.2847748 0.04437741 0.49143010 0.14231506 0.61288950  0.4734572  0.2393844 0.44792570  0.3042310  0.4954311 0.89962516  0.9352717  0.9024063
## 200223270003_R07C01  0.8923890  0.7756577  0.2621492  0.2313285 0.59796746 0.45857830 0.09201303 0.07638127  0.7312175  0.7071501 0.02822675  0.8273211  0.4695066 0.47195215  0.9350230  0.9067989
##                     cg17002338 cg04845852 cg12738248 cg12466610 cg14609402 cg01097733 cg12012426 cg04033559 cg17811452 cg16310958 cg20300784 cg02489327 cg23813394 cg00332268 cg06012621 cg23840008
## 200223270003_R03C01  0.2684163  0.9212268 0.88010292 0.59131778  0.9087631  0.6753081  0.9434768  0.8768243 0.82740141  0.9300073 0.86609999  0.8616312 0.48811365  0.9044887  0.8579519 0.66547425
## 200223270003_R06C01  0.2811103  0.5118209 0.51121855 0.06939623  0.9109735  0.9131513  0.9220044  0.8257388 0.09338396  0.9228871 0.03091187  0.8777949 0.02943436  0.5777209  0.5325037 0.88483246
## 200223270003_R07C01  0.2706349  0.9034373 0.09131476 0.04527733  0.9099145  0.6832952  0.9241284  0.8900962 0.79817238  0.8539019 0.90319796  0.4205073 0.92935625  0.5848006  0.6263080 0.09020907
##                     cg27341708 cg20094343 cg27577781 cg22681945 cg03167407 cg16089727 cg02823329 cg23947654 cg04768387 cg10844498 cg03359067 cg14170504 cg17906851 cg12333628 cg12284872 cg05351360
## 200223270003_R03C01 0.02613847  0.7128750  0.8113185  0.8388195  0.7610292 0.54996692  0.6464005  0.8079296  0.9465814  0.1391318  0.8628564 0.02236650  0.9529718  0.9092861  0.7414569 0.03855181
## 200223270003_R06C01 0.86893582  0.3291595  0.8144274  0.8700500  0.3087606 0.05876736  0.9633930  0.8017579  0.9098563  0.1385549  0.8144536 0.02988245  0.6462151  0.5084647  0.7725267 0.76395533
## 200223270003_R07C01 0.02642300  0.4013815  0.7970617  0.3344105  0.2455453 0.85485461  0.6617541  0.7584946  0.9413240  0.7374725  0.8737908 0.48543531  0.9553497  0.5229394  0.7573369 0.77000888
##                     cg19248407 cg15535896 cg24422984 cg18310072 cg27639199 cg26081710 cg06032337 cg04771146 cg24638099 cg18029737 cg09993718 cg04867412 cg12689021 cg20070588 cg16020483 cg14181112
## 200223270003_R03C01  0.8313131  0.9253926  0.5462594  0.1449858 0.67552763  0.9198212  0.5657198  0.7648566  0.4262170  0.9016634  0.7227856  0.8796800  0.7449475  0.5057088  0.1673606  0.1615405
## 200223270003_R06C01  0.8525281  0.3320191  0.5193121  0.9321264 0.06233093  0.8801892  0.5653758  0.3125007  0.8787392  0.7376586  0.4378752  0.4497115  0.7872237  0.8654344  0.1209622  0.3424621
## 200223270003_R07C01  0.8467857  0.9409104  0.1970387  0.9108063 0.05701332  0.9153264  0.5229594  0.2909958  0.8682765  0.9397667  0.7067889  0.4445373  0.7523141  0.8425849  0.2499647  0.2178314
##                     cg01608425 cg10829391 cg13375589 cg05161773 cg21757617 cg05125667 cg10985055 cg17348244 cg12293347 cg16733676 cg05813498
## 200223270003_R03C01  0.9264388  0.5929616  0.4578240  0.4154907  0.4429909 0.54151552  0.8631895 0.81793075  0.9253031  0.8904541  0.9039353
## 200223270003_R06C01  0.8887753  0.9411947  0.6025638  0.8526849  0.4472538 0.49090787  0.5456633 0.07241099  0.9176094  0.1698111  0.6252849
## 200223270003_R07C01  0.9065432  0.9322956  0.8182629  0.4259275  0.4339315 0.01590936  0.8825100 0.78025001  0.6028463  0.9203317  0.9086932
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.2.2. Logistic Regression Model

9.2.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123) 
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 221 251
dim(testData)
## [1]  94 251
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")

cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        6
##   Dementia  2       22
##                                           
##                Accuracy : 0.9149          
##                  95% CI : (0.8392, 0.9625)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 5.403e-07       
##                                           
##                   Kappa : 0.7878          
##                                           
##  Mcnemar's Test P-Value : 0.2888          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.7857          
##          Pos Pred Value : 0.9143          
##          Neg Pred Value : 0.9167          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7447          
##       Balanced Accuracy : 0.8777          
##                                           
##        'Positive' Class : CN              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]

print(cm_FeatEval_Median_LRM1_Accuracy)
##  Accuracy 
## 0.9148936
print(cm_FeatEval_Median_LRM1_Kappa)
##     Kappa 
## 0.7878104
print(model_LRM1)
## glmnet 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa      
##   0.10   0.006203963  0.8552525   0.62699702
##   0.10   0.019618654  0.8507071   0.61066284
##   0.10   0.062039630  0.8146465   0.50774214
##   0.55   0.006203963  0.7467677   0.36111277
##   0.55   0.019618654  0.7424242   0.31985504
##   0.55   0.062039630  0.6607071   0.02328114
##   1.00   0.006203963  0.6926263   0.23375764
##   1.00   0.019618654  0.6610101   0.11375353
##   1.00   0.062039630  0.6652525  -0.05413455
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.006203963.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.743266
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.743266
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9838
## [1] "The auc value is:"
## Area under the curve: 0.9838

if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_LRM1_AUC <-mean_auc
}
print(FeatEval_Median_LRM1_AUC)
## Area under the curve: 0.9838
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC2          52.41
## cg02872767   36.20
## cg11787167   33.40
## cg09216282   33.29
## cg01680303   30.93
## cg12108278   29.78
## cg19503462   29.74
## cg12080266   29.48
## cg02356645   29.12
## cg07152869   26.91
## cg06378561   26.47
## cg03084184   25.73
## cg26739327   25.17
## cg14780448   25.15
## cg06864789   25.08
## cg01013522   24.85
## cg02932958   24.80
## cg12858518   24.03
## cg04109990   23.93
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##        Overall
## 1   4.61333242
## 2   2.41778350
## 3   1.67020324
## 4   1.54100265
## 5   1.53569474
## 6   1.42684947
## 7   1.37379733
## 8   1.37214079
## 9   1.36009424
## 10  1.34318426
## 11  1.24165754
## 12  1.22127191
## 13  1.18713299
## 14  1.16107694
## 15  1.16007278
## 16  1.15700407
## 17  1.14648908
## 18  1.14423193
## 19  1.10838388
## 20  1.10412516
## 21  1.08557990
## 22  1.07219665
## 23  1.04057544
## 24  0.99483171
## 25  0.99331370
## 26  0.98541626
## 27  0.97484304
## 28  0.97440363
## 29  0.97319497
## 30  0.93575249
## 31  0.93163973
## 32  0.92880290
## 33  0.92224760
## 34  0.91394059
## 35  0.91219072
## 36  0.90569631
## 37  0.90234106
## 38  0.90120204
## 39  0.88130426
## 40  0.88086507
## 41  0.87916361
## 42  0.87162523
## 43  0.84700127
## 44  0.83221685
## 45  0.82637006
## 46  0.81985998
## 47  0.80815172
## 48  0.79715057
## 49  0.78888926
## 50  0.78235269
## 51  0.77812623
## 52  0.77348667
## 53  0.76283969
## 54  0.76081645
## 55  0.75738127
## 56  0.75722239
## 57  0.75122797
## 58  0.74722488
## 59  0.73942862
## 60  0.73834004
## 61  0.73276091
## 62  0.72681056
## 63  0.72642749
## 64  0.72252593
## 65  0.72036526
## 66  0.71738987
## 67  0.71444336
## 68  0.71397901
## 69  0.70558730
## 70  0.69566658
## 71  0.69532100
## 72  0.68932057
## 73  0.67250400
## 74  0.65542598
## 75  0.65255650
## 76  0.64990694
## 77  0.64894355
## 78  0.64692580
## 79  0.63766453
## 80  0.63635530
## 81  0.63353840
## 82  0.63155262
## 83  0.62654849
## 84  0.62209041
## 85  0.61656191
## 86  0.61459044
## 87  0.61400895
## 88  0.60637872
## 89  0.60263946
## 90  0.59454923
## 91  0.59418553
## 92  0.59223433
## 93  0.55463990
## 94  0.55284838
## 95  0.54900122
## 96  0.54787094
## 97  0.54699576
## 98  0.53828374
## 99  0.53669204
## 100 0.53446633
## 101 0.52867688
## 102 0.51810130
## 103 0.51776249
## 104 0.51667745
## 105 0.51395264
## 106 0.51025710
## 107 0.51011432
## 108 0.50754304
## 109 0.49673499
## 110 0.49405474
## 111 0.49303309
## 112 0.48878538
## 113 0.48508004
## 114 0.47767519
## 115 0.46924401
## 116 0.45547789
## 117 0.45113132
## 118 0.44855415
## 119 0.44439913
## 120 0.44046888
## 121 0.43950756
## 122 0.43393175
## 123 0.43130061
## 124 0.42552891
## 125 0.42214893
## 126 0.41797570
## 127 0.41453616
## 128 0.41013528
## 129 0.40944666
## 130 0.40105185
## 131 0.39233827
## 132 0.39094294
## 133 0.39065874
## 134 0.38833995
## 135 0.38758323
## 136 0.38540119
## 137 0.38422693
## 138 0.38283205
## 139 0.38136860
## 140 0.37961499
## 141 0.37834468
## 142 0.37749032
## 143 0.37407563
## 144 0.37297410
## 145 0.36922875
## 146 0.36868795
## 147 0.36500495
## 148 0.36336321
## 149 0.35683077
## 150 0.35545711
## 151 0.34944847
## 152 0.33664722
## 153 0.33566758
## 154 0.33444093
## 155 0.32812077
## 156 0.31482524
## 157 0.30840712
## 158 0.30603128
## 159 0.30490885
## 160 0.30404725
## 161 0.29916215
## 162 0.29480891
## 163 0.29310002
## 164 0.29008646
## 165 0.28901850
## 166 0.28004330
## 167 0.27516822
## 168 0.27207282
## 169 0.25542372
## 170 0.25356988
## 171 0.24534244
## 172 0.24397670
## 173 0.23839562
## 174 0.23737586
## 175 0.23635178
## 176 0.23208419
## 177 0.23191882
## 178 0.23170526
## 179 0.23125670
## 180 0.23064312
## 181 0.22927752
## 182 0.20065062
## 183 0.19817211
## 184 0.19510570
## 185 0.19210354
## 186 0.19074274
## 187 0.18469844
## 188 0.17861347
## 189 0.17673888
## 190 0.17364910
## 191 0.17341407
## 192 0.17311366
## 193 0.17019149
## 194 0.16758407
## 195 0.16709115
## 196 0.14918388
## 197 0.14883106
## 198 0.14754640
## 199 0.14225921
## 200 0.13701627
## 201 0.12917386
## 202 0.12815814
## 203 0.12772743
## 204 0.12417602
## 205 0.11817188
## 206 0.11579635
## 207 0.11322787
## 208 0.11287725
## 209 0.10794814
## 210 0.08802875
## 211 0.08135072
## 212 0.07995182
## 213 0.07602732
## 214 0.06952493
## 215 0.06888567
## 216 0.05622809
## 217 0.05508407
## 218 0.04463766
## 219 0.04228514
## 220 0.04152291
## 221 0.02777733
## 222 0.02593844
## 223 0.02064349
## 224 0.01988719
## 225 0.01493663
## 226 0.01028869
## 227 0.00000000
## 228 0.00000000
## 229 0.00000000
## 230 0.00000000
## 231 0.00000000
## 232 0.00000000
## 233 0.00000000
## 234 0.00000000
## 235 0.00000000
## 236 0.00000000
## 237 0.00000000
## 238 0.00000000
## 239 0.00000000
## 240 0.00000000
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.2.2.2 Model Diagnose & Improve

9.2.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia 
##      221       94
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia 
## 0.7015873 0.2984127
table(trainData$DX)
## 
##       CN Dementia 
##      155       66
prop.table(table(trainData$DX))
## 
##        CN  Dementia 
## 0.7013575 0.2986425
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 2.351064
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 2.348485
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 51.203, df = 1, p-value = 8.328e-13
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 35.842, df = 1, p-value = 2.14e-09
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia 
##      155      132
dim(balanced_data_LGR_1)
## [1] 287 251
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        3
##   Dementia  2       25
##                                           
##                Accuracy : 0.9468          
##                  95% CI : (0.8802, 0.9825)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 3.16e-09        
##                                           
##                   Kappa : 0.8715          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.8929          
##          Pos Pred Value : 0.9552          
##          Neg Pred Value : 0.9259          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7128          
##       Balanced Accuracy : 0.9313          
##                                           
##        'Positive' Class : CN              
## 
print(model_LRM2)
## glmnet 
## 
## 287 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 229, 230, 229, 230, 230 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0002846391  0.9371446  0.8742739
##   0.10   0.0028463905  0.9371446  0.8742739
##   0.10   0.0284639052  0.9336358  0.8670920
##   0.55   0.0002846391  0.8745312  0.7501268
##   0.55   0.0028463905  0.8710224  0.7432505
##   0.55   0.0284639052  0.8465215  0.6947837
##   1.00   0.0002846391  0.8500302  0.7021715
##   1.00   0.0028463905  0.8465215  0.6954184
##   1.00   0.0284639052  0.7733817  0.5465130
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.002846391.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8744371
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC2          55.26
## cg02872767   40.29
## cg11787167   33.20
## cg12108278   32.54
## cg01680303   32.03
## cg12080266   29.80
## cg09216282   29.64
## cg19503462   28.98
## cg07152869   28.57
## cg02356645   27.45
## cg12858518   27.26
## cg06378561   27.02
## cg02932958   26.40
## cg26739327   24.67
## cg17623720   24.62
## cg03084184   24.16
## cg23432430   23.87
## cg04124201   23.76
## cg01013522   23.13
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##          Overall
## 1   6.2690553246
## 2   3.4641478546
## 3   2.5259964781
## 4   2.0815532125
## 5   2.0400036671
## 6   2.0077841512
## 7   1.8679103465
## 8   1.8579515495
## 9   1.8169831358
## 10  1.7911665622
## 11  1.7205771266
## 12  1.7088345028
## 13  1.6941098292
## 14  1.6550186632
## 15  1.5464096821
## 16  1.5431768026
## 17  1.5146280536
## 18  1.4964409481
## 19  1.4896335852
## 20  1.4501075709
## 21  1.4408895565
## 22  1.4405675488
## 23  1.4231526321
## 24  1.3807986897
## 25  1.3645717382
## 26  1.3174010887
## 27  1.3032771972
## 28  1.2922544941
## 29  1.2740859486
## 30  1.2330219553
## 31  1.2192181769
## 32  1.2152656670
## 33  1.2057455805
## 34  1.2031141271
## 35  1.1984632517
## 36  1.1906797019
## 37  1.1446335880
## 38  1.1415237125
## 39  1.1310379283
## 40  1.1310011868
## 41  1.1187026424
## 42  1.1155954985
## 43  1.1127442001
## 44  1.0915275232
## 45  1.0847750763
## 46  1.0752144314
## 47  1.0730286619
## 48  1.0689276291
## 49  1.0688278545
## 50  1.0683575033
## 51  1.0638638467
## 52  1.0453378731
## 53  1.0430186225
## 54  1.0331217197
## 55  1.0243630772
## 56  1.0146736580
## 57  1.0044411238
## 58  1.0021256471
## 59  0.9802570584
## 60  0.9786153198
## 61  0.9775459642
## 62  0.9742570882
## 63  0.9667272937
## 64  0.9558011666
## 65  0.9498874308
## 66  0.9393491665
## 67  0.9379022271
## 68  0.9310072937
## 69  0.9226026829
## 70  0.9187379820
## 71  0.9187169231
## 72  0.9169611647
## 73  0.9131945367
## 74  0.9027555454
## 75  0.8901118167
## 76  0.8853120388
## 77  0.8832033381
## 78  0.8827947521
## 79  0.8495251045
## 80  0.8421252558
## 81  0.8321119685
## 82  0.8223003079
## 83  0.8219985953
## 84  0.8185753520
## 85  0.8151737361
## 86  0.8148750290
## 87  0.8102927816
## 88  0.8095498516
## 89  0.7943629709
## 90  0.7886505109
## 91  0.7846525450
## 92  0.7846071485
## 93  0.7787871230
## 94  0.7738775541
## 95  0.7687682914
## 96  0.7523002026
## 97  0.7451818180
## 98  0.7310162248
## 99  0.7145235439
## 100 0.7095655107
## 101 0.7022813020
## 102 0.6855702622
## 103 0.6845584013
## 104 0.6747685756
## 105 0.6694960614
## 106 0.6691615529
## 107 0.6650059345
## 108 0.6606045885
## 109 0.6531059173
## 110 0.6504585531
## 111 0.6503032522
## 112 0.6483957806
## 113 0.6453316058
## 114 0.6352932632
## 115 0.6350813746
## 116 0.6302679350
## 117 0.6289804470
## 118 0.6252333104
## 119 0.6245596131
## 120 0.6059104346
## 121 0.6052905415
## 122 0.6051182100
## 123 0.6021816767
## 124 0.6007570609
## 125 0.5990249791
## 126 0.5966922395
## 127 0.5966208465
## 128 0.5900719798
## 129 0.5792425786
## 130 0.5790931558
## 131 0.5748991329
## 132 0.5732833638
## 133 0.5589524141
## 134 0.5474563355
## 135 0.5459678380
## 136 0.5273340572
## 137 0.5225943835
## 138 0.5214124095
## 139 0.5211614550
## 140 0.5201606977
## 141 0.5194994693
## 142 0.5193303117
## 143 0.4934480586
## 144 0.4845009280
## 145 0.4829067930
## 146 0.4801291135
## 147 0.4794360302
## 148 0.4766340397
## 149 0.4762775347
## 150 0.4751000593
## 151 0.4698514924
## 152 0.4684781813
## 153 0.4680228746
## 154 0.4652840295
## 155 0.4518091942
## 156 0.4516245952
## 157 0.4469901953
## 158 0.4426852627
## 159 0.4405429952
## 160 0.4404308738
## 161 0.4370619591
## 162 0.4330577609
## 163 0.4322422571
## 164 0.4178648300
## 165 0.4109600701
## 166 0.4086595131
## 167 0.4077540580
## 168 0.3966591653
## 169 0.3840255876
## 170 0.3812310604
## 171 0.3572734849
## 172 0.3569533712
## 173 0.3557069127
## 174 0.3544561695
## 175 0.3449414512
## 176 0.3423280210
## 177 0.3377560940
## 178 0.3356688739
## 179 0.3338368732
## 180 0.3267410524
## 181 0.2983981090
## 182 0.2964521235
## 183 0.2852138350
## 184 0.2751283244
## 185 0.2738059780
## 186 0.2719061159
## 187 0.2702502028
## 188 0.2646433709
## 189 0.2588573608
## 190 0.2574226414
## 191 0.2569910010
## 192 0.2516169495
## 193 0.2500840704
## 194 0.2459941538
## 195 0.2453699446
## 196 0.2092815834
## 197 0.2075190764
## 198 0.1968193161
## 199 0.1943977682
## 200 0.1838874625
## 201 0.1764156526
## 202 0.1727737224
## 203 0.1635426221
## 204 0.1609442226
## 205 0.1584268568
## 206 0.1556863968
## 207 0.1449535183
## 208 0.1344406769
## 209 0.1288023890
## 210 0.1270569054
## 211 0.1256785485
## 212 0.1109393422
## 213 0.1005762951
## 214 0.0969442198
## 215 0.0948671239
## 216 0.0937820246
## 217 0.0928391287
## 218 0.0834768316
## 219 0.0803832607
## 220 0.0744147609
## 221 0.0623544826
## 222 0.0611430870
## 223 0.0524931318
## 224 0.0519998674
## 225 0.0466852158
## 226 0.0388149693
## 227 0.0289947914
## 228 0.0117312985
## 229 0.0002793932
## 230 0.0000000000
## 231 0.0000000000
## 232 0.0000000000
## 233 0.0000000000
## 234 0.0000000000
## 235 0.0000000000
## 236 0.0000000000
## 237 0.0000000000
## 238 0.0000000000
## 239 0.0000000000
## 240 0.0000000000
## 241 0.0000000000
## 242 0.0000000000
## 243 0.0000000000
## 244 0.0000000000
## 245 0.0000000000
## 246 0.0000000000
## 247 0.0000000000
## 248 0.0000000000
## 249 0.0000000000
## 250 0.0000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9843
## [1] "The auc value is:"
## Area under the curve: 0.9843

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.2.3. Elastic Net

9.2.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa      
##   0      0.00100000  0.7961616   0.38677647
##   0      0.05357895  0.7961616   0.38677647
##   0      0.10615789  0.7961616   0.38677647
##   0      0.15873684  0.7961616   0.38677647
##   0      0.21131579  0.7961616   0.38677647
##   0      0.26389474  0.7961616   0.38677647
##   0      0.31647368  0.7961616   0.38677647
##   0      0.36905263  0.7961616   0.38677647
##   0      0.42163158  0.7961616   0.38677647
##   0      0.47421053  0.7961616   0.38677647
##   0      0.52678947  0.7961616   0.38677647
##   0      0.57936842  0.7961616   0.38677647
##   0      0.63194737  0.7961616   0.38677647
##   0      0.68452632  0.7961616   0.38677647
##   0      0.73710526  0.7961616   0.38677647
##   0      0.78968421  0.7961616   0.38677647
##   0      0.84226316  0.7961616   0.38677647
##   0      0.89484211  0.7961616   0.38677647
##   0      0.94742105  0.7961616   0.38677647
##   0      1.00000000  0.7961616   0.38677647
##   1      0.00100000  0.7106061   0.29203569
##   1      0.05357895  0.6607071  -0.06294811
##   1      0.10615789  0.7014141   0.00000000
##   1      0.15873684  0.7014141   0.00000000
##   1      0.21131579  0.7014141   0.00000000
##   1      0.26389474  0.7014141   0.00000000
##   1      0.31647368  0.7014141   0.00000000
##   1      0.36905263  0.7014141   0.00000000
##   1      0.42163158  0.7014141   0.00000000
##   1      0.47421053  0.7014141   0.00000000
##   1      0.52678947  0.7014141   0.00000000
##   1      0.57936842  0.7014141   0.00000000
##   1      0.63194737  0.7014141   0.00000000
##   1      0.68452632  0.7014141   0.00000000
##   1      0.73710526  0.7014141   0.00000000
##   1      0.78968421  0.7014141   0.00000000
##   1      0.84226316  0.7014141   0.00000000
##   1      0.89484211  0.7014141   0.00000000
##   1      0.94742105  0.7014141   0.00000000
##   1      1.00000000  0.7014141   0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 1.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.748
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.748
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData_ENM1$DX)


FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.932126696832579"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.9321267
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       14
##   Dementia  0       14
##                                           
##                Accuracy : 0.8511          
##                  95% CI : (0.7628, 0.9161)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.0006368       
##                                           
##                   Kappa : 0.5841          
##                                           
##  Mcnemar's Test P-Value : 0.0005120       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.8250          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.8511          
##       Balanced Accuracy : 0.7500          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
##  Accuracy 
## 0.8510638
print(cm_FeatEval_Median_ENM1_Kappa)
##     Kappa 
## 0.5840708
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC1         100.00
## PC3          62.45
## PC2          52.79
## cg07152869   43.36
## cg19503462   40.21
## cg09216282   40.01
## cg02872767   36.63
## cg04109990   36.19
## cg11787167   35.92
## cg26757229   35.20
## cg01013522   34.83
## cg26739327   34.63
## cg04124201   34.37
## cg12858518   34.32
## cg02356645   33.53
## cg06864789   33.25
## cg03982462   32.98
## cg01680303   32.45
## cg15775217   32.23
## cg00322003   31.95
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 ||  METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
  
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))


print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   0.574971088
## 2   0.361366041
## 3   0.306391562
## 4   0.252746758
## 5   0.234824961
## 6   0.233714806
## 7   0.214454049
## 8   0.211983939
## 9   0.210428678
## 10  0.206303348
## 11  0.204252536
## 12  0.203085022
## 13  0.201590985
## 14  0.201306899
## 15  0.196808250
## 16  0.195235816
## 17  0.193727726
## 18  0.190687108
## 19  0.189442132
## 20  0.187855977
## 21  0.185241682
## 22  0.184371510
## 23  0.183598446
## 24  0.183512772
## 25  0.182485156
## 26  0.181556482
## 27  0.180823157
## 28  0.177186939
## 29  0.176857437
## 30  0.175571657
## 31  0.171215807
## 32  0.170452530
## 33  0.169908044
## 34  0.169848802
## 35  0.166340987
## 36  0.166019349
## 37  0.164577083
## 38  0.162238951
## 39  0.159746123
## 40  0.158468002
## 41  0.157829246
## 42  0.157303829
## 43  0.156595106
## 44  0.156216532
## 45  0.155658901
## 46  0.155131377
## 47  0.154331280
## 48  0.153617239
## 49  0.152831210
## 50  0.151231569
## 51  0.150591024
## 52  0.149456921
## 53  0.148900376
## 54  0.148344321
## 55  0.146773430
## 56  0.146696122
## 57  0.146416800
## 58  0.145405156
## 59  0.145002125
## 60  0.144957339
## 61  0.144551532
## 62  0.144485703
## 63  0.143597392
## 64  0.140698033
## 65  0.138448591
## 66  0.136192540
## 67  0.135167787
## 68  0.133597784
## 69  0.132672227
## 70  0.132591665
## 71  0.132508348
## 72  0.131788778
## 73  0.131214104
## 74  0.130887182
## 75  0.130240336
## 76  0.129810238
## 77  0.129665002
## 78  0.129496891
## 79  0.128914042
## 80  0.128682548
## 81  0.126014934
## 82  0.125793918
## 83  0.125308820
## 84  0.124909208
## 85  0.123821943
## 86  0.123112884
## 87  0.122717043
## 88  0.122405871
## 89  0.122266912
## 90  0.122241724
## 91  0.122090742
## 92  0.121283529
## 93  0.120455362
## 94  0.119705839
## 95  0.119545448
## 96  0.118846825
## 97  0.118772520
## 98  0.118426567
## 99  0.117638170
## 100 0.116449109
## 101 0.116037521
## 102 0.115731524
## 103 0.115474557
## 104 0.114252014
## 105 0.114043026
## 106 0.113175860
## 107 0.113130833
## 108 0.112649733
## 109 0.111624023
## 110 0.111281520
## 111 0.109566753
## 112 0.108978340
## 113 0.107921483
## 114 0.107792613
## 115 0.107728019
## 116 0.107399294
## 117 0.107337288
## 118 0.107191879
## 119 0.106339559
## 120 0.106112549
## 121 0.105229388
## 122 0.105036376
## 123 0.104733047
## 124 0.104525195
## 125 0.104495364
## 126 0.103681188
## 127 0.102395417
## 128 0.102056521
## 129 0.101487949
## 130 0.101439949
## 131 0.101350841
## 132 0.101192292
## 133 0.100248216
## 134 0.099683632
## 135 0.099620753
## 136 0.099371472
## 137 0.098869769
## 138 0.098314803
## 139 0.098155428
## 140 0.098028663
## 141 0.097592806
## 142 0.097481488
## 143 0.097270243
## 144 0.096429944
## 145 0.096273074
## 146 0.096015896
## 147 0.095348570
## 148 0.094884945
## 149 0.094882331
## 150 0.094231144
## 151 0.093610482
## 152 0.093020797
## 153 0.092728356
## 154 0.092219214
## 155 0.091904675
## 156 0.091689188
## 157 0.091500574
## 158 0.091378832
## 159 0.091232571
## 160 0.090986823
## 161 0.090401152
## 162 0.090249185
## 163 0.089726554
## 164 0.089333021
## 165 0.089123227
## 166 0.088732952
## 167 0.088483152
## 168 0.086983560
## 169 0.086764178
## 170 0.086681485
## 171 0.086441197
## 172 0.086237917
## 173 0.085870795
## 174 0.085734932
## 175 0.085672094
## 176 0.084889874
## 177 0.084330371
## 178 0.084209719
## 179 0.083932370
## 180 0.083745740
## 181 0.083474829
## 182 0.083266073
## 183 0.082793870
## 184 0.082422882
## 185 0.082077689
## 186 0.081999789
## 187 0.081591989
## 188 0.080761582
## 189 0.080626782
## 190 0.080593290
## 191 0.078815949
## 192 0.078760961
## 193 0.077812263
## 194 0.077783403
## 195 0.076413676
## 196 0.075600862
## 197 0.075579985
## 198 0.074788298
## 199 0.073952864
## 200 0.073771272
## 201 0.073705990
## 202 0.072893410
## 203 0.072186096
## 204 0.072041104
## 205 0.072022970
## 206 0.071980638
## 207 0.070662913
## 208 0.069577391
## 209 0.068716813
## 210 0.068517242
## 211 0.068358064
## 212 0.068248232
## 213 0.067597014
## 214 0.067182924
## 215 0.067155893
## 216 0.066548283
## 217 0.066252663
## 218 0.065750733
## 219 0.065488577
## 220 0.063835890
## 221 0.063713816
## 222 0.063662848
## 223 0.062538401
## 224 0.062246196
## 225 0.061557485
## 226 0.061062598
## 227 0.060624051
## 228 0.060488521
## 229 0.059850252
## 230 0.059406387
## 231 0.058797618
## 232 0.058566937
## 233 0.057771782
## 234 0.057769909
## 235 0.056743660
## 236 0.056720015
## 237 0.056279746
## 238 0.056149795
## 239 0.055765838
## 240 0.055186841
## 241 0.054480680
## 242 0.054229323
## 243 0.053411384
## 244 0.052205296
## 245 0.051868645
## 246 0.051143485
## 247 0.050954085
## 248 0.050924656
## 249 0.045284802
## 250 0.006083151
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.9935

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_ENM1_AUC <- mean_auc
}
print(FeatEval_Median_ENM1_AUC)
## Area under the curve: 0.9935

9.2.4. XGBoost

9.2.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa     
##   0.3  1          0.6               0.50        50      0.6967677  0.20914840
##   0.3  1          0.6               0.50       100      0.7285859  0.27052079
##   0.3  1          0.6               0.50       150      0.7149495  0.24775990
##   0.3  1          0.6               0.75        50      0.6966667  0.16118078
##   0.3  1          0.6               0.75       100      0.7466667  0.28746028
##   0.3  1          0.6               0.75       150      0.7558586  0.32748673
##   0.3  1          0.6               1.00        50      0.6876768  0.12339538
##   0.3  1          0.6               1.00       100      0.7060606  0.19795596
##   0.3  1          0.6               1.00       150      0.7196970  0.22378606
##   0.3  1          0.8               0.50        50      0.7239394  0.28679001
##   0.3  1          0.8               0.50       100      0.7332323  0.29040350
##   0.3  1          0.8               0.50       150      0.7375758  0.29647349
##   0.3  1          0.8               0.75        50      0.7012121  0.16488567
##   0.3  1          0.8               0.75       100      0.7284848  0.25343853
##   0.3  1          0.8               0.75       150      0.7419192  0.28177831
##   0.3  1          0.8               1.00        50      0.6924242  0.12082988
##   0.3  1          0.8               1.00       100      0.6923232  0.16258024
##   0.3  1          0.8               1.00       150      0.7284848  0.25456813
##   0.3  2          0.6               0.50        50      0.7062626  0.21213677
##   0.3  2          0.6               0.50       100      0.7287879  0.27541271
##   0.3  2          0.6               0.50       150      0.7197980  0.25127672
##   0.3  2          0.6               0.75        50      0.7014141  0.15198065
##   0.3  2          0.6               0.75       100      0.7240404  0.21424147
##   0.3  2          0.6               0.75       150      0.7284848  0.22315004
##   0.3  2          0.6               1.00        50      0.7468687  0.28503107
##   0.3  2          0.6               1.00       100      0.7467677  0.29047692
##   0.3  2          0.6               1.00       150      0.7647475  0.33048224
##   0.3  2          0.8               0.50        50      0.7060606  0.19814498
##   0.3  2          0.8               0.50       100      0.7467677  0.32182480
##   0.3  2          0.8               0.50       150      0.7467677  0.33004633
##   0.3  2          0.8               0.75        50      0.7149495  0.18835911
##   0.3  2          0.8               0.75       100      0.7331313  0.24205663
##   0.3  2          0.8               0.75       150      0.7286869  0.22490940
##   0.3  2          0.8               1.00        50      0.6790909  0.09161510
##   0.3  2          0.8               1.00       100      0.7197980  0.20018036
##   0.3  2          0.8               1.00       150      0.7197980  0.21206309
##   0.3  3          0.6               0.50        50      0.7336364  0.25623966
##   0.3  3          0.6               0.50       100      0.7425253  0.27222246
##   0.3  3          0.6               0.50       150      0.7424242  0.27667878
##   0.3  3          0.6               0.75        50      0.7376768  0.24778571
##   0.3  3          0.6               0.75       100      0.7423232  0.27164458
##   0.3  3          0.6               0.75       150      0.7376768  0.25567299
##   0.3  3          0.6               1.00        50      0.6966667  0.17460208
##   0.3  3          0.6               1.00       100      0.7103030  0.18210872
##   0.3  3          0.6               1.00       150      0.7104040  0.17998986
##   0.3  3          0.8               0.50        50      0.6745455  0.08387408
##   0.3  3          0.8               0.50       100      0.7151515  0.21286216
##   0.3  3          0.8               0.50       150      0.7106061  0.22366878
##   0.3  3          0.8               0.75        50      0.7061616  0.17629136
##   0.3  3          0.8               0.75       100      0.7014141  0.15776280
##   0.3  3          0.8               0.75       150      0.7015152  0.16537782
##   0.3  3          0.8               1.00        50      0.7197980  0.20093730
##   0.3  3          0.8               1.00       100      0.7334343  0.22643177
##   0.3  3          0.8               1.00       150      0.7244444  0.20994426
##   0.4  1          0.6               0.50        50      0.7285859  0.28122743
##   0.4  1          0.6               0.50       100      0.7603030  0.37842922
##   0.4  1          0.6               0.50       150      0.7647475  0.39827998
##   0.4  1          0.6               0.75        50      0.7060606  0.18425818
##   0.4  1          0.6               0.75       100      0.7285859  0.23758218
##   0.4  1          0.6               0.75       150      0.7376768  0.29122504
##   0.4  1          0.6               1.00        50      0.7015152  0.17419790
##   0.4  1          0.6               1.00       100      0.7286869  0.26960853
##   0.4  1          0.6               1.00       150      0.7241414  0.23963715
##   0.4  1          0.8               0.50        50      0.7013131  0.22894207
##   0.4  1          0.8               0.50       100      0.7058586  0.23481156
##   0.4  1          0.8               0.50       150      0.7149495  0.26477260
##   0.4  1          0.8               0.75        50      0.6924242  0.16877835
##   0.4  1          0.8               0.75       100      0.7104040  0.23366187
##   0.4  1          0.8               0.75       150      0.7057576  0.21906043
##   0.4  1          0.8               1.00        50      0.7194949  0.19866032
##   0.4  1          0.8               1.00       100      0.7195960  0.24120645
##   0.4  1          0.8               1.00       150      0.7467677  0.29658209
##   0.4  2          0.6               0.50        50      0.7328283  0.28343386
##   0.4  2          0.6               0.50       100      0.7510101  0.32017319
##   0.4  2          0.6               0.50       150      0.7555556  0.33550880
##   0.4  2          0.6               0.75        50      0.7194949  0.18960549
##   0.4  2          0.6               0.75       100      0.7422222  0.27267351
##   0.4  2          0.6               0.75       150      0.7422222  0.27267351
##   0.4  2          0.6               1.00        50      0.7286869  0.21485989
##   0.4  2          0.6               1.00       100      0.7284848  0.22250675
##   0.4  2          0.6               1.00       150      0.7284848  0.22250675
##   0.4  2          0.8               0.50        50      0.6923232  0.19706305
##   0.4  2          0.8               0.50       100      0.7331313  0.27718099
##   0.4  2          0.8               0.50       150      0.7422222  0.29801813
##   0.4  2          0.8               0.75        50      0.7151515  0.18997643
##   0.4  2          0.8               0.75       100      0.7151515  0.19399901
##   0.4  2          0.8               0.75       150      0.7105051  0.16564706
##   0.4  2          0.8               1.00        50      0.7240404  0.18640875
##   0.4  2          0.8               1.00       100      0.7058586  0.14586140
##   0.4  2          0.8               1.00       150      0.7058586  0.14586140
##   0.4  3          0.6               0.50        50      0.7015152  0.18894559
##   0.4  3          0.6               0.50       100      0.7195960  0.25281924
##   0.4  3          0.6               0.50       150      0.7331313  0.27952696
##   0.4  3          0.6               0.75        50      0.7241414  0.23621310
##   0.4  3          0.6               0.75       100      0.7241414  0.23454755
##   0.4  3          0.6               0.75       150      0.7241414  0.23454755
##   0.4  3          0.6               1.00        50      0.7148485  0.18705928
##   0.4  3          0.6               1.00       100      0.7238384  0.22013354
##   0.4  3          0.6               1.00       150      0.7238384  0.22013354
##   0.4  3          0.8               0.50        50      0.7374747  0.28371088
##   0.4  3          0.8               0.50       100      0.7421212  0.29305140
##   0.4  3          0.8               0.50       150      0.7421212  0.29632450
##   0.4  3          0.8               0.75        50      0.7015152  0.12296970
##   0.4  3          0.8               0.75       100      0.7105051  0.17314724
##   0.4  3          0.8               0.75       150      0.7105051  0.17459116
##   0.4  3          0.8               1.00        50      0.7283838  0.23412497
##   0.4  3          0.8               1.00       100      0.7420202  0.27071392
##   0.4  3          0.8               1.00       150      0.7420202  0.27071392
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 1, eta = 0.4, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.7226983
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.7226983
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       60       16
##   Dementia  6       12
##                                           
##                Accuracy : 0.766           
##                  95% CI : (0.6674, 0.8471)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.10563         
##                                           
##                   Kappa : 0.3764          
##                                           
##  Mcnemar's Test P-Value : 0.05501         
##                                           
##             Sensitivity : 0.9091          
##             Specificity : 0.4286          
##          Pos Pred Value : 0.7895          
##          Neg Pred Value : 0.6667          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6383          
##    Detection Prevalence : 0.8085          
##       Balanced Accuracy : 0.6688          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]

print(cm_FeatEval_Median_xgb_Accuracy)
##  Accuracy 
## 0.7659574
print(cm_FeatEval_Median_xgb_Kappa)
##     Kappa 
## 0.3763571
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## cg10542624  100.00
## cg16390578   67.59
## cg25561557   62.51
## cg24861747   62.34
## cg06697310   61.69
## cg04124201   60.97
## cg13885788   56.33
## cg02095601   56.29
## cg14609402   54.97
## cg21757617   54.90
## cg04109990   54.37
## cg05096415   53.89
## cg15775217   51.58
## cg19512141   48.58
## cg24859648   47.62
## cg12080266   46.45
## PC1          45.95
## cg22901347   44.59
## cg05373298   43.23
## cg05841700   42.50
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain       Cover   Frequency   Importance
##          <char>        <num>       <num>       <num>        <num>
##   1: cg10542624 0.0503357981 0.032098893 0.020833333 0.0503357981
##   2: cg16390578 0.0340210219 0.029479970 0.027777778 0.0340210219
##   3: cg25561557 0.0314634491 0.026293815 0.020833333 0.0314634491
##   4: cg24861747 0.0313783491 0.034904284 0.027777778 0.0313783491
##   5: cg06697310 0.0310521896 0.025508262 0.013888889 0.0310521896
##  ---                                                             
## 103: cg01013522 0.0004086777 0.001902347 0.006944444 0.0004086777
## 104: cg11314779 0.0003990846 0.001794902 0.006944444 0.0003990846
## 105: cg20913114 0.0003770611 0.001718225 0.006944444 0.0003770611
## 106: cg12776173 0.0003201644 0.001706139 0.006944444 0.0003201644
## 107: cg03359067 0.0003057801 0.001723554 0.006944444 0.0003057801
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.8198

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_xgb_AUC <-mean_auc
}
print(FeatEval_Median_xgb_AUC)
## Area under the curve: 0.8198

9.2.5. Random Forest

9.2.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.7014141  0.00000000
##   126   0.7014141  0.02368421
##   250   0.7059596  0.04315201
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 250.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.7029293
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.7029293
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       24
##   Dementia  0        4
##                                           
##                Accuracy : 0.7447          
##                  95% CI : (0.6443, 0.8291)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.2167          
##                                           
##                   Kappa : 0.1897          
##                                           
##  Mcnemar's Test P-Value : 2.668e-06       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.1429          
##          Pos Pred Value : 0.7333          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.9574          
##       Balanced Accuracy : 0.5714          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
##  Accuracy 
## 0.7446809
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
##     Kappa 
## 0.1896552
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Importance
## cg06864789     100.00
## cg17044529      98.05
## cg12776173      95.20
## cg24861747      83.12
## cg21986118      81.32
## cg24851651      78.94
## cg10701746      75.08
## cg04831745      74.34
## cg07152869      73.51
## cg23836570      70.47
## cg02356645      68.82
## cg13885788      67.45
## cg04124201      67.13
## cg12333628      67.09
## cg20218135      66.66
## cg19248407      66.35
## cg02656016      66.35
## cg26948066      65.76
## PC1             65.16
## cg04218584      64.98
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
##               CN     Dementia
## 1    3.463383348  3.463383348
## 2    3.359407852  3.359407852
## 3    3.207690035  3.207690035
## 4    2.563915928  2.563915928
## 5    2.467975364  2.467975364
## 6    2.341433068  2.341433068
## 7    2.135506216  2.135506216
## 8    2.096228634  2.096228634
## 9    2.051900694  2.051900694
## 10   1.890184195  1.890184195
## 11   1.802170917  1.802170917
## 12   1.728821539  1.728821539
## 13   1.712021353  1.712021353
## 14   1.710006510  1.710006510
## 15   1.687218118  1.687218118
## 16   1.670611732  1.670611732
## 17   1.670525002  1.670525002
## 18   1.639242740  1.639242740
## 19   1.607083203  1.607083203
## 20   1.597595440  1.597595440
## 21   1.569811983  1.569811983
## 22   1.529483523  1.529483523
## 23   1.495373210  1.495373210
## 24   1.458532176  1.458532176
## 25   1.423924133  1.423924133
## 26   1.408186319  1.408186319
## 27   1.399626658  1.399626658
## 28   1.397741263  1.397741263
## 29   1.382593845  1.382593845
## 30   1.285744036  1.285744036
## 31   1.283333212  1.283333212
## 32   1.262571592  1.262571592
## 33   1.251823521  1.251823521
## 34   1.120621611  1.120621611
## 35   1.054250355  1.054250355
## 36   1.047204819  1.047204819
## 37   1.009541054  1.009541054
## 38   1.001343259  1.001343259
## 39   0.995851879  0.995851879
## 40   0.986021632  0.986021632
## 41   0.982390898  0.982390898
## 42   0.980420805  0.980420805
## 43   0.972960452  0.972960452
## 44   0.970601526  0.970601526
## 45   0.968667326  0.968667326
## 46   0.949903810  0.949903810
## 47   0.946816106  0.946816106
## 48   0.946369948  0.946369948
## 49   0.922778569  0.922778569
## 50   0.920793312  0.920793312
## 51   0.920445672  0.920445672
## 52   0.908335259  0.908335259
## 53   0.902366941  0.902366941
## 54   0.891253015  0.891253015
## 55   0.880323349  0.880323349
## 56   0.871303707  0.871303707
## 57   0.868552165  0.868552165
## 58   0.844786928  0.844786928
## 59   0.785175925  0.785175925
## 60   0.766989809  0.766989809
## 61   0.751661553  0.751661553
## 62   0.715221654  0.715221654
## 63   0.686846601  0.686846601
## 64   0.647999429  0.647999429
## 65   0.646775966  0.646775966
## 66   0.640838920  0.640838920
## 67   0.640737144  0.640737144
## 68   0.613594671  0.613594671
## 69   0.604010104  0.604010104
## 70   0.593431199  0.593431199
## 71   0.592165444  0.592165444
## 72   0.589999925  0.589999925
## 73   0.578922227  0.578922227
## 74   0.577012853  0.577012853
## 75   0.570591217  0.570591217
## 76   0.556238966  0.556238966
## 77   0.556055077  0.556055077
## 78   0.541522050  0.541522050
## 79   0.530614090  0.530614090
## 80   0.525333583  0.525333583
## 81   0.520451909  0.520451909
## 82   0.516233568  0.516233568
## 83   0.502264232  0.502264232
## 84   0.500394197  0.500394197
## 85   0.492210768  0.492210768
## 86   0.439088956  0.439088956
## 87   0.436160216  0.436160216
## 88   0.400973123  0.400973123
## 89   0.396103075  0.396103075
## 90   0.394220550  0.394220550
## 91   0.387427447  0.387427447
## 92   0.384590564  0.384590564
## 93   0.373497279  0.373497279
## 94   0.356539291  0.356539291
## 95   0.354433312  0.354433312
## 96   0.349107709  0.349107709
## 97   0.329278979  0.329278979
## 98   0.316213614  0.316213614
## 99   0.293030205  0.293030205
## 100  0.292050022  0.292050022
## 101  0.288213945  0.288213945
## 102  0.282075476  0.282075476
## 103  0.263989783  0.263989783
## 104  0.246462656  0.246462656
## 105  0.246203415  0.246203415
## 106  0.245470275  0.245470275
## 107  0.236026146  0.236026146
## 108  0.200487599  0.200487599
## 109  0.195067588  0.195067588
## 110  0.194578900  0.194578900
## 111  0.192997739  0.192997739
## 112  0.187396143  0.187396143
## 113  0.179899955  0.179899955
## 114  0.177877335  0.177877335
## 115  0.171928798  0.171928798
## 116  0.170290869  0.170290869
## 117  0.162261958  0.162261958
## 118  0.153472560  0.153472560
## 119  0.149823277  0.149823277
## 120  0.149650120  0.149650120
## 121  0.144537235  0.144537235
## 122  0.136843742  0.136843742
## 123  0.136071586  0.136071586
## 124  0.132378525  0.132378525
## 125  0.119350566  0.119350566
## 126  0.118553598  0.118553598
## 127  0.107921270  0.107921270
## 128  0.105237090  0.105237090
## 129  0.104164797  0.104164797
## 130  0.101836422  0.101836422
## 131  0.098841358  0.098841358
## 132  0.097381416  0.097381416
## 133  0.094009911  0.094009911
## 134  0.091320336  0.091320336
## 135  0.088204954  0.088204954
## 136  0.075001461  0.075001461
## 137  0.064861877  0.064861877
## 138  0.064630294  0.064630294
## 139  0.054713816  0.054713816
## 140  0.041612750  0.041612750
## 141  0.034774790  0.034774790
## 142  0.029422433  0.029422433
## 143  0.024623895  0.024623895
## 144  0.020926067  0.020926067
## 145  0.008203004  0.008203004
## 146  0.007142504  0.007142504
## 147  0.002340218  0.002340218
## 148 -0.009115822 -0.009115822
## 149 -0.015195795 -0.015195795
## 150 -0.017670237 -0.017670237
## 151 -0.018036461 -0.018036461
## 152 -0.044056705 -0.044056705
## 153 -0.045655882 -0.045655882
## 154 -0.053561485 -0.053561485
## 155 -0.061073112 -0.061073112
## 156 -0.074514476 -0.074514476
## 157 -0.092152395 -0.092152395
## 158 -0.099684552 -0.099684552
## 159 -0.105806133 -0.105806133
## 160 -0.109452166 -0.109452166
## 161 -0.114835447 -0.114835447
## 162 -0.117160000 -0.117160000
## 163 -0.134164782 -0.134164782
## 164 -0.150254311 -0.150254311
## 165 -0.157423043 -0.157423043
## 166 -0.162269410 -0.162269410
## 167 -0.162817903 -0.162817903
## 168 -0.163597528 -0.163597528
## 169 -0.185678462 -0.185678462
## 170 -0.192902067 -0.192902067
## 171 -0.203066892 -0.203066892
## 172 -0.213136236 -0.213136236
## 173 -0.220528604 -0.220528604
## 174 -0.223090446 -0.223090446
## 175 -0.237567208 -0.237567208
## 176 -0.263202612 -0.263202612
## 177 -0.295455534 -0.295455534
## 178 -0.296671520 -0.296671520
## 179 -0.309570046 -0.309570046
## 180 -0.311977304 -0.311977304
## 181 -0.367701606 -0.367701606
## 182 -0.375371417 -0.375371417
## 183 -0.385686555 -0.385686555
## 184 -0.389109854 -0.389109854
## 185 -0.396121971 -0.396121971
## 186 -0.397940205 -0.397940205
## 187 -0.398640975 -0.398640975
## 188 -0.411284484 -0.411284484
## 189 -0.430808520 -0.430808520
## 190 -0.433416897 -0.433416897
## 191 -0.435320160 -0.435320160
## 192 -0.443269308 -0.443269308
## 193 -0.457129884 -0.457129884
## 194 -0.488668778 -0.488668778
## 195 -0.489361682 -0.489361682
## 196 -0.499753053 -0.499753053
## 197 -0.508540098 -0.508540098
## 198 -0.510738782 -0.510738782
## 199 -0.534195342 -0.534195342
## 200 -0.568512002 -0.568512002
## 201 -0.587011296 -0.587011296
## 202 -0.605204260 -0.605204260
## 203 -0.639239779 -0.639239779
## 204 -0.641269165 -0.641269165
## 205 -0.643631695 -0.643631695
## 206 -0.671984095 -0.671984095
## 207 -0.676991454 -0.676991454
## 208 -0.687037561 -0.687037561
## 209 -0.695774511 -0.695774511
## 210 -0.699233824 -0.699233824
## 211 -0.707133293 -0.707133293
## 212 -0.729793251 -0.729793251
## 213 -0.740132494 -0.740132494
## 214 -0.758133636 -0.758133636
## 215 -0.762747154 -0.762747154
## 216 -0.768619109 -0.768619109
## 217 -0.798767486 -0.798767486
## 218 -0.809918140 -0.809918140
## 219 -0.818705967 -0.818705967
## 220 -0.825542100 -0.825542100
## 221 -0.829974612 -0.829974612
## 222 -0.830270582 -0.830270582
## 223 -0.849570179 -0.849570179
## 224 -0.881392311 -0.881392311
## 225 -0.882978202 -0.882978202
## 226 -0.897651017 -0.897651017
## 227 -0.899364711 -0.899364711
## 228 -0.905891723 -0.905891723
## 229 -0.930771381 -0.930771381
## 230 -0.932135860 -0.932135860
## 231 -0.932472585 -0.932472585
## 232 -0.959379881 -0.959379881
## 233 -0.964540224 -0.964540224
## 234 -0.973918285 -0.973918285
## 235 -0.978912591 -0.978912591
## 236 -1.113347038 -1.113347038
## 237 -1.123566409 -1.123566409
## 238 -1.127610029 -1.127610029
## 239 -1.206130668 -1.206130668
## 240 -1.226791192 -1.226791192
## 241 -1.267127600 -1.267127600
## 242 -1.276160590 -1.276160590
## 243 -1.290150611 -1.290150611
## 244 -1.306676014 -1.306676014
## 245 -1.342602475 -1.342602475
## 246 -1.420062586 -1.420062586
## 247 -1.445316100 -1.445316100
## 248 -1.467434023 -1.467434023
## 249 -1.678315749 -1.678315749
## 250 -1.864832468 -1.864832468
if(METHOD_FEATURE_FLAG==3 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.7762

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_rf_AUC<-mean_auc
}
print(FeatEval_Median_rf_AUC)
## Area under the curve: 0.7762

9.2.6. SVM

9.2.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 221 samples
## 250 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 177, 177, 177, 176 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.9275758  0.8324136
##   0.50  0.9321212  0.8435211
##   1.00  0.9184848  0.8096845
## 
## Tuning parameter 'sigma' was held constant at a value of 0.002092245
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002092245 and C = 0.5.
print(svm_model$bestTune)
##         sigma   C
## 2 0.002092245 0.5
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.9260606
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.9260606
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.995475113122172"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9954751
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       63        1
##   Dementia  3       27
##                                           
##                Accuracy : 0.9574          
##                  95% CI : (0.8946, 0.9883)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 4.017e-10       
##                                           
##                   Kappa : 0.9003          
##                                           
##  Mcnemar's Test P-Value : 0.6171          
##                                           
##             Sensitivity : 0.9545          
##             Specificity : 0.9643          
##          Pos Pred Value : 0.9844          
##          Neg Pred Value : 0.9000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6702          
##    Detection Prevalence : 0.6809          
##       Balanced Accuracy : 0.9594          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
##  Accuracy 
## 0.9574468
print(cm_FeatEval_Median_svm_Kappa)
##     Kappa 
## 0.9003181

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 315 rows and 251 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg06864789          1.40        1.6          1.76        0.02539683
## 2 cg03084184          1.20        1.6          1.76        0.02539683
## 3        PC1          1.40        1.4          1.60        0.02222222
## 4 cg04242342          1.24        1.4          1.56        0.02222222
## 5 cg14780448          1.20        1.4          1.40        0.02222222
## 6 cg16390578          1.20        1.4          1.40        0.02222222
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (test_data_SVM1$DX Dementia) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9746
## [1] "The auc vlue is:"
## Area under the curve: 0.9746

if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_svm_AUC <- mean_auc
}
print(FeatEval_Median_svm_AUC )
## Area under the curve: 0.9746

9.3 Selected Based on Frequency

9.3.1 Input Feature For Evaluation

Performance of the selected output features based on Frequency

processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature

AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 263
##   DX         PC1     PC2 cg11787167 cg09216282 cg01680303 cg12080266 cg19503462 cg02356645 cg06378561 cg07152869 cg03084184 cg01013522 cg26739327 cg06864789 cg14780448 cg02932958 cg12858518 cg04124201
##   <fct>    <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CN    -0.173    0.0575     0.0467      0.924      0.134      0.945      0.454      0.583      0.938      0.505      0.788      0.886     0.769      0.461      0.670       0.421      0.929      0.331
## 2 CN    -0.00367  0.0837     0.326       0.926      0.757      0.936      0.700      0.570      0.515      0.835      0.455      0.543     0.873      0.875      0.621       0.383      0.902      0.324
## 3 Deme… -0.187   -0.0112     0.432       0.935      0.477      0.640      0.719      0.568      0.940      0.519      0.781      0.843     0.834      0.490      0.0443      0.762      0.919      0.433
## 4 CN    -0.0379   0.0157     0.465       0.866      0.513      0.575      0.421      0.919      0.927      0.808      0.773      0.824     0.105      0.0542     0.913       0.761      0.935      0.307
## 5 Deme… -0.139    0.0299     0.0569      0.921      0.110      0.539      0.740      0.907      0.927      0.773      0.944      0.512     0.757      0.835      0.911       0.396      0.912      0.375
## 6 CN    -0.213    0.0518     0.421       0.921      0.304      0.554      0.417      0.895      0.513      0.805      0.422      0.492     0.0827     0.374      0.655       0.699      0.896      0.373
## # ℹ 244 more variables: cg03982462 <dbl>, cg12306781 <dbl>, cg23432430 <dbl>, cg00322003 <dbl>, cg04109990 <dbl>, cg27114706 <dbl>, cg15775217 <dbl>, cg20218135 <dbl>, cg03392100 <dbl>,
## #   cg17044529 <dbl>, cg27452255 <dbl>, cg02078724 <dbl>, cg05096415 <dbl>, cg20507276 <dbl>, cg25561557 <dbl>, cg17623720 <dbl>, cg17118775 <dbl>, cg12471283 <dbl>, cg00421199 <dbl>,
## #   cg02217425 <dbl>, cg16338321 <dbl>, cg20913114 <dbl>, cg14764203 <dbl>, cg15730644 <dbl>, cg16715186 <dbl>, cg24861747 <dbl>, cg09584650 <dbl>, cg09650803 <dbl>, cg23698271 <dbl>,
## #   cg12702014 <dbl>, cg22901347 <dbl>, cg07584620 <dbl>, cg13799572 <dbl>, cg18339359 <dbl>, cg22274273 <dbl>, cg10701746 <dbl>, cg04798314 <dbl>, cg01280698 <dbl>, cg05749243 <dbl>,
## #   cg26474732 <dbl>, cg06870118 <dbl>, cg15700429 <dbl>, cg24065597 <dbl>, cg05841700 <dbl>, cg03640465 <dbl>, cg18526121 <dbl>, cg21575308 <dbl>, cg11716267 <dbl>, cg15591384 <dbl>,
## #   cg10786572 <dbl>, cg21578644 <dbl>, cg07138269 <dbl>, cg24104387 <dbl>, cg12240569 <dbl>, cg04218584 <dbl>, cg21501207 <dbl>, cg03172493 <dbl>, cg11835797 <dbl>, cg00841008 <dbl>,
## #   cg18662228 <dbl>, cg02302183 <dbl>, cg13080267 <dbl>, cg04831745 <dbl>, cg11358878 <dbl>, cg02901522 <dbl>, cg14170504 <dbl>, cg14924512 <dbl>, cg16390578 <dbl>, cg09247979 <dbl>, …
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "PC2"        "cg11787167" "cg09216282" "cg01680303" "cg12080266" "cg19503462" "cg02356645" "cg06378561" "cg07152869" "cg03084184" "cg01013522" "cg26739327" "cg06864789" "cg14780448"
##  [16] "cg02932958" "cg12858518" "cg04124201" "cg03982462" "cg12306781" "cg23432430" "cg00322003" "cg04109990" "cg27114706" "cg15775217" "cg20218135" "cg03392100" "cg17044529" "cg27452255" "cg02078724"
##  [31] "cg05096415" "cg20507276" "cg25561557" "cg17623720" "cg17118775" "cg12471283" "cg00421199" "cg02217425" "cg16338321" "cg20913114" "cg14764203" "cg15730644" "cg16715186" "cg24861747" "cg09584650"
##  [46] "cg09650803" "cg23698271" "cg12702014" "cg22901347" "cg07584620" "cg13799572" "cg18339359" "cg22274273" "cg10701746" "cg04798314" "cg01280698" "cg05749243" "cg26474732" "cg06870118" "cg15700429"
##  [61] "cg24065597" "cg05841700" "cg03640465" "cg18526121" "cg21575308" "cg11716267" "cg15591384" "cg10786572" "cg21578644" "cg07138269" "cg24104387" "cg12240569" "cg04218584" "cg21501207" "cg03172493"
##  [76] "cg11835797" "cg00841008" "cg18662228" "cg02302183" "cg13080267" "cg04831745" "cg11358878" "cg02901522" "cg14170504" "cg14924512" "cg16390578" "cg09247979" "cg24851651" "cg04242342" "cg18037388"
##  [91] "cg18821122" "cg04467639" "cg00977253" "cg08584917" "cg26889118" "cg14904299" "cg17329602" "cg06697310" "cg07456472" "cg23916408" "cg21533482" "cg11834635" "cg14465143" "cg16098618" "cg02656016"
## [106] "cg05351360" "cg10507965" "cg17811452" "cg12284872" "cg00999469" "cg02823329" "cg08397053" "cg12279734" "cg06624143" "cg03628603" "cg02389264" "cg05373298" "cg04073914" "cg16268937" "cg03115532"
## [121] "cg14252149" "cg10542624" "cg16361249" "cg13226272" "cg27224751" "cg12074150" "cg00332268" "cg27187580" "cg19555075" "cg04867412" "cg25174111" "cg15399577" "cg04033559" "cg11314779" "cg04845852"
## [136] "cg04768387" "cg22653957" "cg24422984" "cg17002338" "cg21986118" "cg23813394" "cg02489327" "cg12466610" "cg04771146" "cg01608425" "cg07304760" "cg13885788" "cg11227702" "cg12689021" "cg17906851"
## [151] "cg05377703" "cg02495179" "cg04664583" "cg26948066" "cg20094343" "cg00156497" "cg27341708" "cg02981548" "cg16020483" "cg18861767" "cg03327352" "cg27639199" "cg02627240" "cg22681945" "cg11109139"
## [166] "cg02095601" "cg16733676" "cg16089727" "cg17419220" "cg17429539" "cg10058204" "cg12776173" "cg25758034" "cg06032337" "cg10829391" "cg26007606" "cg14181112" "cg26081710" "cg00051154" "cg01130884"
## [181] "cg17386240" "cg12333628" "cg26983017" "cg24638099" "PC3"        "cg19248407" "cg16310958" "cg23836570" "cg03167407" "cg06012621" "cg21757617" "cg05161773" "cg03359067" "cg02872767" "cg12108278"
## [196] "cg27286614" "cg24859648" "cg12556569" "cg16858433" "cg19512141" "cg06264882" "cg10666341" "cg00675157" "cg26052728" "cg08242313" "cg22071943" "cg12434901" "cg23840008" "cg11173002" "cg05059349"
## [211] "cg05321907" "cg23350716" "cg00648024" "cg11706829" "cg02494911" "cg10844498" "cg03187614" "cg04970287" "cg12213037" "cg05813498" "cg20678988" "cg18029737" "cg12012426" "cg12421087" "cg16431720"
## [226] "age.now"    "cg17296678" "cg26901661" "cg07951602" "cg17348244" "cg03057303" "cg07971231" "cg01097733" "cg04577745" "cg05125667" "cg20070588" "cg15535896" "cg12293347" "cg26757229" "cg06875704"
## [241] "cg22251955" "cg23947654" "cg09518270" "cg06536614" "cg11331837" "cg23161429" "cg09993718" "cg00729708" "cg19848641" "cg12738248" "cg01802772" "cg10985055" "cg03088219" "cg16536985" "cg26089705"
## [256] "cg12925689" "cg05130642" "cg05138546" "cg16527629" "cg11826549" "cg06002867" "cg20704148"
print(length(df_process_frequency_FeatureName))
## [1] 262
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
##                           DX          PC1         PC2 cg11787167 cg09216282 cg01680303 cg12080266 cg19503462 cg02356645 cg06378561 cg07152869 cg03084184 cg01013522 cg26739327 cg06864789 cg14780448
## 200223270003_R03C01       CN -0.172761185  0.05745834 0.04673831  0.9244259  0.1344941  0.9450629  0.4537684  0.5833923  0.9377503   0.505063  0.7877128  0.8862821  0.7693268  0.4605312 0.67021018
## 200223270003_R06C01       CN -0.003667305  0.08372861 0.32564508  0.9263996  0.7573869  0.9363381  0.6997359  0.5701428  0.5154019   0.835249  0.4546397  0.5425308  0.8727608  0.8751365 0.62073547
## 200223270003_R07C01 Dementia -0.186779607 -0.01117250 0.43162543  0.9352308  0.4772204  0.6398247  0.7189778  0.5683381  0.9403569   0.519430  0.7812413  0.8429862  0.8340445  0.4902033 0.04425741
##                     cg02932958 cg12858518 cg04124201 cg03982462 cg12306781 cg23432430 cg00322003 cg04109990 cg27114706 cg15775217 cg20218135 cg03392100 cg17044529 cg27452255 cg02078724 cg05096415
## 200223270003_R03C01  0.4210489  0.9285252  0.3308589  0.6023731  0.8663817  0.9455418  0.5702070  0.6476604  0.9359259  0.9168327 0.64278153  0.9227394  0.9117895  0.6593379  0.2896133  0.5177819
## 200223270003_R06C01  0.3825995  0.9017533  0.3241613  0.8778458  0.8027798  0.9418716  0.3077122  0.6692040  0.9285384  0.6042521 0.06509247  0.8902340  0.9290636  0.9012217  0.2805612  0.6288426
## 200223270003_R07C01  0.7617081  0.9187879  0.4332693  0.8860227  0.8787250  0.9426559  0.6104341  0.9024920  0.4787397  0.9062231 0.65642359  0.4359657  0.9402858  0.8898635  0.2739571  0.6060271
##                     cg20507276 cg25561557 cg17623720 cg17118775 cg12471283 cg00421199 cg02217425 cg16338321 cg20913114 cg14764203 cg15730644 cg16715186 cg24861747 cg09584650 cg09650803 cg23698271
## 200223270003_R03C01 0.38721972 0.03851635  0.8988624  0.5585676  0.8658731  0.8532461  0.1032503  0.8294062 0.80382984  0.4683709  0.4353906  0.7946153  0.4309505 0.09661586  0.8954464  0.9109565
## 200223270003_R06C01 0.47978438 0.47259480  0.8172384  0.2916054  0.6963410  0.8891803  0.6592850  0.4918708 0.03158439  0.8916566  0.8763048  0.8124316  0.8071462 0.52399749  0.9113477  0.9051701
## 200223270003_R07C01 0.02261996 0.43364249  0.8226085  0.2868948  0.6680611  0.8937751  0.8792021  0.5245645 0.81256840  0.8714472  0.4833709  0.7773263  0.3347317 0.11587211  0.2518414  0.8804362
##                     cg12702014  cg22901347 cg07584620 cg13799572 cg18339359 cg22274273 cg10701746 cg04798314 cg01280698 cg05749243 cg26474732 cg06870118 cg15700429 cg24065597 cg05841700 cg03640465
## 200223270003_R03C01  0.7848681 0.001690332  0.3763980  0.8449584  0.9040272  0.4246379  0.4868342 0.07119798 0.88462009  0.9209685  0.8184088  0.8100144  0.9114530  0.2221098  0.9146488  0.2531644
## 200223270003_R06C01  0.8065993 0.103413834  0.8530961  0.4409219  0.8552121  0.4196796  0.4927257 0.09248843 0.88471320  0.9143061  0.7358417  0.7802055  0.8838233  0.7036129  0.3737990  0.2904433
## 200223270003_R07C01  0.7458594 0.632991482  0.3888623  0.8516975  0.3073106  0.4164100  0.8552180 0.06972566 0.06370005  0.9121180  0.7509296  0.7917257  0.9095363  0.2407676  0.5046468  0.9024530
##                     cg18526121 cg21575308 cg11716267 cg15591384 cg10786572 cg21578644 cg07138269 cg24104387 cg12240569 cg04218584 cg21501207 cg03172493 cg11835797 cg00841008 cg18662228 cg02302183
## 200223270003_R03C01  0.4762313 0.44702405 0.04959702  0.7870275  0.5982086  0.9260863  0.9426707  0.5339034 0.02690547  0.8971263  0.6813712 0.63362492  0.9007408 0.61899333  0.8730153  0.9191148
## 200223270003_R06C01  0.4833367 0.44792570 0.49143010  0.7429614  0.0935115  0.9159726  0.5057781  0.3007614 0.46030640  0.8491768  0.4747229 0.06148804  0.8944957 0.05401588  0.8602464  0.8749250
## 200223270003_R07C01  0.7761450 0.02822675 0.45857830  0.8346279  0.8436837  0.9178001  0.9400527  0.7509780 0.86185839  0.9008137  0.7422003 0.64562298  0.8168544 0.90769205  0.8683578  0.8888247
##                     cg13080267 cg04831745 cg11358878 cg02901522 cg14170504 cg14924512 cg16390578 cg09247979 cg24851651 cg04242342 cg18037388 cg18821122 cg04467639 cg00977253 cg08584917 cg26889118
## 200223270003_R03C01 0.78371483 0.71214149 0.83252951  0.9372901 0.02236650  0.9160885 0.20983422  0.5706177 0.05358297  0.8167892  0.7545086  0.5901603  0.6400206  0.9145988  0.9019732  0.9154836
## 200223270003_R06C01 0.09436069 0.06871768 0.87521203  0.4954978 0.02988245  0.9088414 0.06389068  0.5090215 0.05968923  0.8040357  0.7294565  0.5779620  0.5657041  0.8944518  0.9187789  0.9101336
## 200223270003_R07C01 0.09351259 0.90994644 0.08917903  0.9381188 0.48543531  0.9081681 0.23101450  0.5066661 0.60864179  0.8286115  0.2391659  0.9251431  0.6302917  0.9150206  0.6007449  0.5759967
##                     cg14904299 cg17329602 cg06697310 cg07456472 cg23916408 cg21533482 cg11834635 cg14465143 cg16098618 cg02656016 cg05351360 cg10507965 cg17811452 cg12284872 cg00999469 cg02823329
## 200223270003_R03C01  0.2712472  0.8189317  0.8653044  0.5856904  0.9154993  0.8288469  0.8880887  0.5543068  0.2571464  0.2355680 0.03855181  0.4010973 0.82740141  0.7414569  0.2857719  0.6464005
## 200223270003_R06C01  0.8364544  0.8478185  0.2405168  0.3886482  0.8886255  0.6766373  0.2493491  0.2702875  0.6899734  0.9052318 0.76395533  0.4033691 0.09338396  0.7725267  0.2499229  0.9633930
## 200223270003_R07C01  0.8193867  0.8596400  0.8479193  0.9186405  0.8872447  0.6235932  0.2210428  0.2621492  0.6488005  0.8653682 0.77000888  0.3869543 0.79817238  0.7573369  0.2819622  0.6617541
##                     cg08397053 cg12279734 cg06624143 cg03628603 cg02389264 cg05373298 cg04073914 cg16268937 cg03115532 cg14252149 cg10542624 cg16361249 cg13226272 cg27224751 cg12074150 cg00332268
## 200223270003_R03C01 0.04199567  0.1494651  0.4899758  0.9157246  0.7900942 0.02652391 0.03089677  0.8931712  0.8659608 0.02450779 0.02189577 0.52843073  0.5410002 0.03214912 0.18602738  0.9044887
## 200223270003_R06C01 0.04437741  0.8760759  0.9107688  0.8851075  0.7789974 0.83538124 0.89962516  0.9034556  0.8533871 0.02382413 0.54330620 0.09039669  0.4437070 0.83123722 0.14231506  0.5777209
## 200223270003_R07C01 0.59796746  0.8674214  0.9217350  0.8923890  0.4174463 0.89506024 0.47195215  0.8928450  0.4416574 0.56212480 0.54991492 0.42039062  0.0265215 0.79732117 0.09201303  0.5848006
##                     cg27187580 cg19555075 cg04867412 cg25174111 cg15399577 cg04033559 cg11314779 cg04845852 cg04768387 cg22653957 cg24422984 cg17002338 cg21986118 cg23813394 cg02489327 cg12466610
## 200223270003_R03C01  0.6643576  0.4921409  0.8796800  0.8573844  0.8785443  0.8768243  0.8966100  0.9212268  0.9465814  0.6442184  0.5462594  0.2684163  0.6571296 0.48811365  0.8616312 0.59131778
## 200223270003_R06C01  0.6914924  0.4261618  0.4497115  0.2567745  0.8703169  0.8257388  0.8908661  0.5118209  0.9098563  0.9531308  0.5193121  0.2811103  0.7034445 0.02943436  0.8777949 0.06939623
## 200223270003_R07C01  0.9357074  0.4694729  0.4445373  0.1903803  0.8968856  0.8900962  0.9048316  0.9034373  0.9413240  0.6534542  0.1970387  0.2706349  0.9055894 0.92935625  0.4205073 0.04527733
##                     cg04771146 cg01608425 cg07304760 cg13885788 cg11227702 cg12689021 cg17906851 cg05377703 cg02495179 cg04664583 cg26948066 cg20094343 cg00156497 cg27341708 cg02981548 cg16020483
## 200223270003_R03C01  0.7648566  0.9264388  0.5798534  0.9369476 0.49184121  0.7449475  0.9529718  0.8213047  0.7373055  0.5881190  0.5026045  0.7128750  0.5194653 0.02613847  0.5220037  0.1673606
## 200223270003_R06C01  0.3125007  0.8887753  0.5575516  0.5163017 0.02543724  0.7872237  0.6462151  0.5152514  0.5588114  0.9352717  0.9101976  0.3291595  0.9024063 0.86893582  0.5098965  0.1209622
## 200223270003_R07C01  0.2909958  0.9065432  0.9195617  0.9183376 0.45150971  0.7523141  0.9553497  0.7773036  0.5273309  0.9350230  0.9379543  0.4013815  0.9067989 0.02642300  0.5660985  0.2499647
##                     cg18861767 cg03327352 cg27639199 cg02627240 cg22681945 cg11109139 cg02095601 cg16733676 cg16089727 cg17419220 cg17429539 cg10058204 cg12776173 cg25758034 cg06032337 cg10829391
## 200223270003_R03C01  0.7847380  0.8786878 0.67552763 0.57129408  0.8388195  0.6350109  0.9161259  0.8904541 0.54996692 0.43470227  0.7100923  0.5834496  0.8730635  0.6649219  0.5657198  0.5929616
## 200223270003_R06C01  0.4734572  0.3042310 0.06233093 0.05309659  0.8700500  0.6904482  0.2233062  0.1698111 0.05876736 0.02781411  0.7660838  0.0549494  0.7009491  0.2393844  0.5653758  0.9411947
## 200223270003_R07C01  0.7312175  0.8273211 0.05701332 0.52179136  0.3344105  0.6274025  0.8978191  0.9203317 0.85485461 0.42803809  0.6984969  0.5689591  0.1136716  0.7071501  0.5229594  0.9322956
##                     cg26007606 cg14181112 cg26081710 cg00051154 cg01130884 cg17386240 cg12333628 cg26983017 cg24638099          PC3 cg19248407 cg16310958 cg23836570 cg03167407 cg06012621 cg21757617
## 200223270003_R03C01  0.5615550  0.1615405  0.9198212 0.08370609  0.6230659  0.7144809  0.9092861 0.03145466  0.4262170  0.005055871  0.8313131  0.9300073 0.54259383  0.7610292  0.8579519  0.4429909
## 200223270003_R06C01  0.1463111  0.3424621  0.8801892 0.61288950  0.2847748  0.8074824  0.5084647 0.84677625  0.8787392  0.029143653  0.8525281  0.9228871 0.03267304  0.3087606  0.5325037  0.4472538
## 200223270003_R07C01  0.8101800  0.2178314  0.9153264 0.07638127  0.2313285  0.7227918  0.5229394 0.53922255  0.8682765 -0.032302430  0.8467857  0.8539019 0.59939745  0.2455453  0.6263080  0.4339315
##                     cg05161773 cg03359067 cg02872767 cg12108278 cg27286614 cg24859648 cg12556569 cg16858433 cg19512141 cg06264882 cg10666341 cg00675157 cg26052728 cg08242313 cg22071943 cg12434901
## 200223270003_R03C01  0.4154907  0.8628564  0.3886537  0.9243869  0.5933858 0.44392797 0.03924599  0.9194211  0.7903543 0.43678655  0.6731062  0.9242325  0.1513937  0.8953645  0.2442648  0.8458468
## 200223270003_R06C01  0.8526849  0.8144536  0.9099575  0.9068995  0.6348795 0.03341185 0.48636893  0.9271632  0.8404684 0.43703442  0.6443180  0.9254708  0.5254754  0.8573493  0.2644581  0.8299579
## 200223270003_R07C01  0.4259275  0.8737908  0.8603283  0.9131367  0.9468370 0.43582347 0.46498877  0.9288986  0.2202759 0.02439581  0.8970292  0.5447244  0.5600724  0.8992114  0.2599947  0.8482994
##                     cg23840008 cg11173002 cg05059349 cg05321907 cg23350716 cg00648024 cg11706829 cg02494911 cg10844498 cg03187614 cg04970287 cg12213037 cg05813498 cg20678988 cg18029737 cg12012426
## 200223270003_R03C01 0.66547425  0.5913599 0.04507417  0.1782629  0.7876873 0.40202875  0.5444785  0.2416332  0.1391318  0.8826518  0.8875750   0.248785  0.9039353  0.8548886  0.9016634  0.9434768
## 200223270003_R06C01 0.88483246  0.1878736 0.03898752  0.8427929  0.6960544 0.05579011  0.5669449  0.2520909  0.1385549  0.5131472  0.4651667   0.812695  0.6252849  0.7786685  0.7376586  0.9220044
## 200223270003_R07C01 0.09020907  0.5150840 0.85329923  0.8320504  0.7387498 0.03708944  0.8746449  0.2457032  0.7374725  0.5281030  0.9092326   0.506374  0.9086932  0.8260541  0.9397667  0.9241284
##                     cg12421087 cg16431720  age.now cg17296678 cg26901661 cg07951602 cg17348244 cg03057303 cg07971231 cg01097733 cg04577745 cg05125667 cg20070588 cg15535896 cg12293347 cg26757229
## 200223270003_R03C01  0.5399655  0.8692449 78.60000  0.5653917  0.8754981  0.8766842 0.81793075  0.8923039  0.8406145  0.6753081  0.2681033 0.54151552  0.5057088  0.9253926  0.9253031  0.1422661
## 200223270003_R06C01  0.5400348  0.8773137 80.40000  0.5272971  0.9021064  0.8918089 0.07241099  0.4954311  0.8447914  0.9131513  0.8570624 0.49090787  0.8654344  0.3320191  0.9176094  0.7933794
## 200223270003_R07C01  0.5291975  0.8988328 78.16441  0.7661613  0.8556831  0.8706938 0.78025001  0.4695066  0.8874706  0.6832952  0.9002276 0.01590936  0.8425849  0.9409104  0.6028463  0.8074830
##                     cg06875704 cg22251955 cg23947654 cg09518270 cg06536614 cg11331837 cg23161429 cg09993718 cg00729708 cg19848641 cg12738248 cg01802772 cg10985055  cg03088219 cg16536985 cg26089705
## 200223270003_R03C01  0.9181165 0.02254441  0.8079296  0.8870663  0.5746694 0.57150125  0.9099619  0.7227856  0.1188099  0.9155493 0.88010292 0.02361869  0.8631895 0.007435243  0.5418687 0.50810373
## 200223270003_R06C01  0.9200461 0.02714054  0.8017579  0.8765622  0.5773468 0.03182862  0.8833895  0.4378752  0.1206326  0.4888000 0.51121855 0.02401520  0.5456633 0.120155222  0.8392044 0.03322136
## 200223270003_R07C01  0.9048289 0.89577950  0.7584946  0.8135001  0.5848917 0.03832164  0.9134709  0.7067889  0.7636159  0.9139292 0.09131476 0.02200957  0.8825100 0.826554308  0.8822891 0.03118009
##                     cg12925689 cg05130642 cg05138546 cg16527629 cg11826549 cg06002867 cg20704148
## 200223270003_R03C01 0.38196419  0.8644077  0.6230487  0.4365003 0.04794983 0.84888752 0.02409027
## 200223270003_R06C01 0.02873309  0.3661324  0.8963047  0.0708336 0.03672380 0.02698175 0.02580923
## 200223270003_R07C01 0.38592071  0.3039272  0.9057159  0.4492586 0.51173417 0.48042117 0.47357786
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.3.2. Logistic Regression Model

9.3.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 221 263
dim(testData)
## [1]  94 263
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        6
##   Dementia  2       22
##                                           
##                Accuracy : 0.9149          
##                  95% CI : (0.8392, 0.9625)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 5.403e-07       
##                                           
##                   Kappa : 0.7878          
##                                           
##  Mcnemar's Test P-Value : 0.2888          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.7857          
##          Pos Pred Value : 0.9143          
##          Neg Pred Value : 0.9167          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7447          
##       Balanced Accuracy : 0.8777          
##                                           
##        'Positive' Class : CN              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]

print(cm_FeatEval_Freq_LRM1_Accuracy)
##  Accuracy 
## 0.9148936
print(cm_FeatEval_Freq_LRM1_Kappa)
##     Kappa 
## 0.7878104
print(model_LRM1)
## glmnet 
## 
## 221 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa      
##   0.10   0.006203963  0.8553535   0.63168162
##   0.10   0.019618654  0.8553535   0.62652666
##   0.10   0.062039630  0.8236364   0.52620020
##   0.55   0.006203963  0.7423232   0.34973149
##   0.55   0.019618654  0.7287879   0.28816550
##   0.55   0.062039630  0.6652525   0.03155014
##   1.00   0.006203963  0.6790909   0.20440246
##   1.00   0.019618654  0.6745455   0.13596024
##   1.00   0.062039630  0.6652525  -0.05413455
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01961865.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7432884
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.7432884
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9784
## [1] "The auc value is:"
## Area under the curve: 0.9784

if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_LRM1_AUC <- mean_auc
}
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 262)
## 
##            Overall
## PC1         100.00
## PC2          45.48
## cg02872767   34.49
## cg11787167   33.77
## cg09216282   33.05
## cg19503462   30.38
## cg12080266   29.03
## cg01680303   28.74
## cg07152869   27.89
## cg02356645   27.79
## cg12108278   27.47
## cg03084184   27.44
## cg06378561   26.66
## cg01013522   26.12
## cg06864789   24.75
## cg12858518   24.72
## cg26739327   24.66
## cg02932958   23.90
## cg04109990   23.89
## cg14780448   23.83
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##          Overall
## 1   3.891553e+00
## 2   1.769811e+00
## 3   1.342283e+00
## 4   1.314085e+00
## 5   1.286159e+00
## 6   1.182146e+00
## 7   1.129753e+00
## 8   1.118621e+00
## 9   1.085484e+00
## 10  1.081561e+00
## 11  1.068868e+00
## 12  1.067915e+00
## 13  1.037605e+00
## 14  1.016538e+00
## 15  9.631820e-01
## 16  9.617978e-01
## 17  9.594757e-01
## 18  9.300582e-01
## 19  9.295740e-01
## 20  9.273738e-01
## 21  9.013307e-01
## 22  8.952991e-01
## 23  8.849005e-01
## 24  8.722618e-01
## 25  8.557944e-01
## 26  8.497638e-01
## 27  8.480729e-01
## 28  8.241319e-01
## 29  8.188579e-01
## 30  8.001225e-01
## 31  7.859355e-01
## 32  7.567568e-01
## 33  7.442492e-01
## 34  7.407924e-01
## 35  7.384521e-01
## 36  7.331250e-01
## 37  7.319115e-01
## 38  7.287147e-01
## 39  7.234722e-01
## 40  7.233039e-01
## 41  7.143823e-01
## 42  7.127433e-01
## 43  7.027042e-01
## 44  7.000441e-01
## 45  6.935796e-01
## 46  6.830744e-01
## 47  6.756200e-01
## 48  6.693328e-01
## 49  6.622946e-01
## 50  6.611650e-01
## 51  6.412721e-01
## 52  6.354155e-01
## 53  6.316531e-01
## 54  6.297827e-01
## 55  6.278924e-01
## 56  6.243618e-01
## 57  6.243586e-01
## 58  6.239554e-01
## 59  6.139478e-01
## 60  6.092298e-01
## 61  6.039709e-01
## 62  5.883419e-01
## 63  5.859259e-01
## 64  5.847990e-01
## 65  5.798799e-01
## 66  5.758870e-01
## 67  5.741995e-01
## 68  5.713401e-01
## 69  5.690582e-01
## 70  5.643614e-01
## 71  5.632614e-01
## 72  5.503061e-01
## 73  5.452603e-01
## 74  5.431653e-01
## 75  5.371287e-01
## 76  5.266458e-01
## 77  5.262964e-01
## 78  5.230496e-01
## 79  5.110257e-01
## 80  5.095127e-01
## 81  5.032578e-01
## 82  5.022402e-01
## 83  4.993609e-01
## 84  4.894193e-01
## 85  4.865928e-01
## 86  4.841555e-01
## 87  4.795481e-01
## 88  4.680907e-01
## 89  4.655083e-01
## 90  4.646296e-01
## 91  4.639847e-01
## 92  4.619308e-01
## 93  4.612833e-01
## 94  4.520255e-01
## 95  4.518823e-01
## 96  4.460468e-01
## 97  4.451642e-01
## 98  4.437013e-01
## 99  4.429791e-01
## 100 4.415990e-01
## 101 4.414815e-01
## 102 4.352623e-01
## 103 4.229965e-01
## 104 4.196010e-01
## 105 4.177809e-01
## 106 4.177024e-01
## 107 4.146907e-01
## 108 4.103114e-01
## 109 4.088972e-01
## 110 4.058537e-01
## 111 3.944433e-01
## 112 3.933724e-01
## 113 3.917739e-01
## 114 3.868203e-01
## 115 3.835314e-01
## 116 3.767963e-01
## 117 3.735080e-01
## 118 3.700108e-01
## 119 3.693772e-01
## 120 3.580099e-01
## 121 3.551547e-01
## 122 3.522681e-01
## 123 3.477036e-01
## 124 3.438816e-01
## 125 3.387455e-01
## 126 3.350908e-01
## 127 3.310719e-01
## 128 3.250698e-01
## 129 3.224805e-01
## 130 3.212747e-01
## 131 3.200753e-01
## 132 3.148939e-01
## 133 3.133766e-01
## 134 3.090486e-01
## 135 3.046858e-01
## 136 3.038941e-01
## 137 3.035632e-01
## 138 3.018846e-01
## 139 3.000557e-01
## 140 3.000396e-01
## 141 2.961022e-01
## 142 2.954190e-01
## 143 2.953970e-01
## 144 2.913662e-01
## 145 2.900439e-01
## 146 2.899270e-01
## 147 2.848307e-01
## 148 2.846790e-01
## 149 2.750170e-01
## 150 2.724026e-01
## 151 2.708394e-01
## 152 2.667148e-01
## 153 2.491467e-01
## 154 2.483096e-01
## 155 2.470661e-01
## 156 2.451754e-01
## 157 2.444096e-01
## 158 2.441656e-01
## 159 2.421101e-01
## 160 2.417087e-01
## 161 2.374334e-01
## 162 2.363601e-01
## 163 2.309304e-01
## 164 2.286247e-01
## 165 2.231685e-01
## 166 2.224297e-01
## 167 2.205450e-01
## 168 2.191152e-01
## 169 2.186434e-01
## 170 2.155090e-01
## 171 2.094638e-01
## 172 2.066003e-01
## 173 2.040685e-01
## 174 2.021132e-01
## 175 1.918734e-01
## 176 1.890564e-01
## 177 1.822008e-01
## 178 1.781444e-01
## 179 1.749297e-01
## 180 1.729922e-01
## 181 1.701505e-01
## 182 1.679313e-01
## 183 1.620062e-01
## 184 1.606574e-01
## 185 1.604747e-01
## 186 1.568308e-01
## 187 1.561160e-01
## 188 1.513860e-01
## 189 1.483436e-01
## 190 1.474875e-01
## 191 1.445958e-01
## 192 1.435339e-01
## 193 1.425253e-01
## 194 1.414079e-01
## 195 1.343773e-01
## 196 1.288797e-01
## 197 1.262414e-01
## 198 1.230952e-01
## 199 1.218464e-01
## 200 1.171963e-01
## 201 1.083228e-01
## 202 1.043859e-01
## 203 1.015030e-01
## 204 9.671008e-02
## 205 9.583125e-02
## 206 8.726737e-02
## 207 8.311697e-02
## 208 8.152572e-02
## 209 7.313171e-02
## 210 7.019787e-02
## 211 6.320294e-02
## 212 6.025354e-02
## 213 5.543137e-02
## 214 5.143606e-02
## 215 4.965995e-02
## 216 4.621939e-02
## 217 4.463046e-02
## 218 4.442277e-02
## 219 4.246743e-02
## 220 2.625728e-02
## 221 2.376093e-02
## 222 1.919814e-02
## 223 1.913625e-02
## 224 1.139124e-02
## 225 6.016151e-03
## 226 5.693191e-03
## 227 2.570410e-03
## 228 2.234831e-03
## 229 2.740972e-05
## 230 0.000000e+00
## 231 0.000000e+00
## 232 0.000000e+00
## 233 0.000000e+00
## 234 0.000000e+00
## 235 0.000000e+00
## 236 0.000000e+00
## 237 0.000000e+00
## 238 0.000000e+00
## 239 0.000000e+00
## 240 0.000000e+00
## 241 0.000000e+00
## 242 0.000000e+00
## 243 0.000000e+00
## 244 0.000000e+00
## 245 0.000000e+00
## 246 0.000000e+00
## 247 0.000000e+00
## 248 0.000000e+00
## 249 0.000000e+00
## 250 0.000000e+00
## 251 0.000000e+00
## 252 0.000000e+00
## 253 0.000000e+00
## 254 0.000000e+00
## 255 0.000000e+00
## 256 0.000000e+00
## 257 0.000000e+00
## 258 0.000000e+00
## 259 0.000000e+00
## 260 0.000000e+00
## 261 0.000000e+00
## 262 0.000000e+00
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.3.2.2 Model Diagnose & Improve

9.3.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia 
##      221       94
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia 
## 0.7015873 0.2984127
table(trainData$DX)
## 
##       CN Dementia 
##      155       66
prop.table(table(trainData$DX))
## 
##        CN  Dementia 
## 0.7013575 0.2986425
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 2.351064
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 2.348485
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 51.203, df = 1, p-value = 8.328e-13
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 35.842, df = 1, p-value = 2.14e-09
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia 
##      155      132
dim(balanced_data_LGR_1)
## [1] 287 263
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       64        4
##   Dementia  2       24
##                                           
##                Accuracy : 0.9362          
##                  95% CI : (0.8662, 0.9762)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 2.052e-08       
##                                           
##                   Kappa : 0.8442          
##                                           
##  Mcnemar's Test P-Value : 0.6831          
##                                           
##             Sensitivity : 0.9697          
##             Specificity : 0.8571          
##          Pos Pred Value : 0.9412          
##          Neg Pred Value : 0.9231          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6809          
##    Detection Prevalence : 0.7234          
##       Balanced Accuracy : 0.9134          
##                                           
##        'Positive' Class : CN              
## 
print(model_LRM2)
## glmnet 
## 
## 287 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 230, 230, 229, 230, 229 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0002615738  0.9197822  0.8385482
##   0.10   0.0026157377  0.9197822  0.8385482
##   0.10   0.0261573772  0.9197822  0.8385482
##   0.55   0.0002615738  0.8642468  0.7291173
##   0.55   0.0026157377  0.8642468  0.7291173
##   0.55   0.0261573772  0.8188748  0.6374922
##   1.00   0.0002615738  0.8293406  0.6600226
##   1.00   0.0026157377  0.8363581  0.6736509
##   1.00   0.0261573772  0.7875378  0.5747694
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.02615738.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8622168
importance_model_LRM2 <- varImp(model_LRM2)


print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 262)
## 
##            Overall
## PC1         100.00
## PC2          39.79
## cg11787167   36.90
## cg02872767   35.29
## cg19503462   32.49
## cg07152869   31.96
## cg06378561   29.64
## cg04109990   29.45
## cg12080266   29.21
## cg09216282   29.13
## cg12108278   29.06
## cg26739327   28.68
## cg01013522   28.62
## cg01680303   28.22
## cg02356645   27.30
## cg03982462   25.56
## cg03084184   25.14
## cg04124201   25.12
## cg12858518   25.07
## cg06864789   24.76
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){

importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)

ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##         Overall
## 1   3.613116136
## 2   1.437590580
## 3   1.333385332
## 4   1.275019982
## 5   1.173826936
## 6   1.154896343
## 7   1.070934626
## 8   1.063951325
## 9   1.055286421
## 10  1.052433673
## 11  1.049900020
## 12  1.036222580
## 13  1.034194164
## 14  1.019653952
## 15  0.986216872
## 16  0.923473865
## 17  0.908395963
## 18  0.907537423
## 19  0.905886696
## 20  0.894743789
## 21  0.886013933
## 22  0.843426653
## 23  0.837238806
## 24  0.829855143
## 25  0.828136119
## 26  0.827385761
## 27  0.801523934
## 28  0.790544199
## 29  0.789186596
## 30  0.788271416
## 31  0.770095486
## 32  0.764983548
## 33  0.757300526
## 34  0.735714951
## 35  0.732416840
## 36  0.703761999
## 37  0.701669964
## 38  0.695463110
## 39  0.692204089
## 40  0.689087290
## 41  0.665940128
## 42  0.665285311
## 43  0.661039790
## 44  0.657095100
## 45  0.640014647
## 46  0.638131921
## 47  0.632385997
## 48  0.629819707
## 49  0.626198556
## 50  0.622383480
## 51  0.622261098
## 52  0.621413866
## 53  0.618881180
## 54  0.609187488
## 55  0.599955123
## 56  0.590410340
## 57  0.581352521
## 58  0.574770061
## 59  0.570410462
## 60  0.568884281
## 61  0.563790827
## 62  0.558091290
## 63  0.541752924
## 64  0.537783020
## 65  0.537376102
## 66  0.535958559
## 67  0.521561271
## 68  0.520206184
## 69  0.519232742
## 70  0.513093069
## 71  0.507146187
## 72  0.506580742
## 73  0.502528991
## 74  0.497666003
## 75  0.493170809
## 76  0.492779060
## 77  0.488082919
## 78  0.482059376
## 79  0.478461875
## 80  0.478096454
## 81  0.469920250
## 82  0.469349162
## 83  0.468247310
## 84  0.464713865
## 85  0.457719874
## 86  0.445058804
## 87  0.444048920
## 88  0.441129648
## 89  0.440963054
## 90  0.440776372
## 91  0.431625623
## 92  0.427204277
## 93  0.418388242
## 94  0.411187073
## 95  0.407434418
## 96  0.406819852
## 97  0.403677601
## 98  0.403065815
## 99  0.402676102
## 100 0.402628119
## 101 0.392760628
## 102 0.392572731
## 103 0.388232753
## 104 0.388093173
## 105 0.384737235
## 106 0.384197730
## 107 0.384139580
## 108 0.378599835
## 109 0.377596070
## 110 0.377289238
## 111 0.376590835
## 112 0.376107766
## 113 0.365155680
## 114 0.364773917
## 115 0.362860776
## 116 0.356008924
## 117 0.347013105
## 118 0.346355361
## 119 0.338920211
## 120 0.338286242
## 121 0.331880782
## 122 0.329485920
## 123 0.318459685
## 124 0.314182079
## 125 0.312859933
## 126 0.312315831
## 127 0.308889510
## 128 0.308390260
## 129 0.306173560
## 130 0.304160686
## 131 0.303165792
## 132 0.290209666
## 133 0.289420887
## 134 0.288965713
## 135 0.288579814
## 136 0.287405920
## 137 0.283629528
## 138 0.282413174
## 139 0.281288443
## 140 0.281270541
## 141 0.279822783
## 142 0.273460044
## 143 0.269142902
## 144 0.265596374
## 145 0.263347298
## 146 0.258608708
## 147 0.256871005
## 148 0.256351011
## 149 0.256284798
## 150 0.252426399
## 151 0.248839273
## 152 0.246390788
## 153 0.245765638
## 154 0.241737740
## 155 0.238879425
## 156 0.238721145
## 157 0.231045789
## 158 0.225938279
## 159 0.218101115
## 160 0.217923834
## 161 0.215987407
## 162 0.215329924
## 163 0.208431089
## 164 0.208306028
## 165 0.207783330
## 166 0.202009896
## 167 0.195728145
## 168 0.189085412
## 169 0.187502377
## 170 0.185806683
## 171 0.183690550
## 172 0.183474152
## 173 0.179934633
## 174 0.177522616
## 175 0.168038680
## 176 0.160490950
## 177 0.156734547
## 178 0.155470672
## 179 0.153510151
## 180 0.148730936
## 181 0.148380739
## 182 0.145323464
## 183 0.139044096
## 184 0.138527172
## 185 0.138434554
## 186 0.135324295
## 187 0.135004414
## 188 0.134683486
## 189 0.132545057
## 190 0.127914434
## 191 0.117675059
## 192 0.115624498
## 193 0.114340203
## 194 0.109691494
## 195 0.107019191
## 196 0.103619927
## 197 0.099112409
## 198 0.098313540
## 199 0.097874976
## 200 0.094375344
## 201 0.094362151
## 202 0.093855935
## 203 0.091513622
## 204 0.088342117
## 205 0.088146515
## 206 0.086264813
## 207 0.081146836
## 208 0.079796626
## 209 0.075383354
## 210 0.074854772
## 211 0.072796943
## 212 0.065529347
## 213 0.061577445
## 214 0.053188612
## 215 0.052531496
## 216 0.047802713
## 217 0.044932347
## 218 0.039259044
## 219 0.034160142
## 220 0.029250149
## 221 0.015717043
## 222 0.009249729
## 223 0.004570935
## 224 0.002170120
## 225 0.000000000
## 226 0.000000000
## 227 0.000000000
## 228 0.000000000
## 229 0.000000000
## 230 0.000000000
## 231 0.000000000
## 232 0.000000000
## 233 0.000000000
## 234 0.000000000
## 235 0.000000000
## 236 0.000000000
## 237 0.000000000
## 238 0.000000000
## 239 0.000000000
## 240 0.000000000
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
## 251 0.000000000
## 252 0.000000000
## 253 0.000000000
## 254 0.000000000
## 255 0.000000000
## 256 0.000000000
## 257 0.000000000
## 258 0.000000000
## 259 0.000000000
## 260 0.000000000
## 261 0.000000000
## 262 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (testData$DX Dementia) > 66 cases (testData$DX CN).
## Area under the curve: 0.9746
## [1] "The auc value is:"
## Area under the curve: 0.9746

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.3.3. Elastic Net

9.3.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 221 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa      
##   0      0.00100000  0.8008081   0.40492109
##   0      0.05357895  0.8008081   0.40492109
##   0      0.10615789  0.8008081   0.40492109
##   0      0.15873684  0.8008081   0.40492109
##   0      0.21131579  0.8008081   0.40492109
##   0      0.26389474  0.8008081   0.40492109
##   0      0.31647368  0.8008081   0.40492109
##   0      0.36905263  0.8008081   0.40492109
##   0      0.42163158  0.8008081   0.40492109
##   0      0.47421053  0.8008081   0.40492109
##   0      0.52678947  0.8008081   0.40492109
##   0      0.57936842  0.8008081   0.40492109
##   0      0.63194737  0.8008081   0.40492109
##   0      0.68452632  0.8008081   0.40492109
##   0      0.73710526  0.8008081   0.40492109
##   0      0.78968421  0.8008081   0.40492109
##   0      0.84226316  0.8008081   0.40492109
##   0      0.89484211  0.8008081   0.40492109
##   0      0.94742105  0.8008081   0.40492109
##   0      1.00000000  0.8008081   0.40492109
##   1      0.00100000  0.7060606   0.27955253
##   1      0.05357895  0.6607071  -0.06294811
##   1      0.10615789  0.7014141   0.00000000
##   1      0.15873684  0.7014141   0.00000000
##   1      0.21131579  0.7014141   0.00000000
##   1      0.26389474  0.7014141   0.00000000
##   1      0.31647368  0.7014141   0.00000000
##   1      0.36905263  0.7014141   0.00000000
##   1      0.42163158  0.7014141   0.00000000
##   1      0.47421053  0.7014141   0.00000000
##   1      0.52678947  0.7014141   0.00000000
##   1      0.57936842  0.7014141   0.00000000
##   1      0.63194737  0.7014141   0.00000000
##   1      0.68452632  0.7014141   0.00000000
##   1      0.73710526  0.7014141   0.00000000
##   1      0.78968421  0.7014141   0.00000000
##   1      0.84226316  0.7014141   0.00000000
##   1      0.89484211  0.7014141   0.00000000
##   1      0.94742105  0.7014141   0.00000000
##   1      1.00000000  0.7014141   0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 1.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7502096
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.7502096
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.927601809954751"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.9276018
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       13
##   Dementia  0       15
##                                           
##                Accuracy : 0.8617          
##                  95% CI : (0.7751, 0.9243)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.0002470       
##                                           
##                   Kappa : 0.6184          
##                                           
##  Mcnemar's Test P-Value : 0.0008741       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5357          
##          Pos Pred Value : 0.8354          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.8404          
##       Balanced Accuracy : 0.7679          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
##  Accuracy 
## 0.8617021
print(cm_FeatEval_Freq_ENM1_Kappa)
##     Kappa 
## 0.6183635
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 262)
## 
##            Overall
## PC1         100.00
## PC3          66.97
## PC2          54.39
## cg07152869   42.71
## cg19503462   40.11
## cg09216282   38.78
## cg02872767   36.79
## cg11787167   36.26
## cg04109990   35.86
## cg01013522   35.50
## cg26757229   34.99
## cg26739327   34.40
## cg12858518   34.15
## cg04124201   34.11
## cg03982462   33.76
## cg06864789   33.72
## cg02356645   33.22
## cg15775217   33.09
## cg01680303   32.48
## cg12306781   32.08
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   0.585345684
## 2   0.392354340
## 3   0.318870835
## 4   0.250606349
## 5   0.235402247
## 6   0.227618904
## 7   0.215991576
## 8   0.212917476
## 9   0.210601487
## 10  0.208475160
## 11  0.205478035
## 12  0.202044124
## 13  0.200606166
## 14  0.200319520
## 15  0.198298598
## 16  0.198057685
## 17  0.195118208
## 18  0.194395417
## 19  0.190811201
## 20  0.188484428
## 21  0.187806430
## 22  0.187699259
## 23  0.186141662
## 24  0.182873432
## 25  0.182255423
## 26  0.180122341
## 27  0.179975046
## 28  0.179876608
## 29  0.176832941
## 30  0.175944676
## 31  0.174546616
## 32  0.172916623
## 33  0.172881840
## 34  0.172157808
## 35  0.171780715
## 36  0.168367657
## 37  0.163533041
## 38  0.163385885
## 39  0.160615192
## 40  0.159804779
## 41  0.159330886
## 42  0.158528324
## 43  0.156959814
## 44  0.155621333
## 45  0.153690754
## 46  0.153562948
## 47  0.153394118
## 48  0.153184769
## 49  0.152769634
## 50  0.152646544
## 51  0.150957198
## 52  0.150748889
## 53  0.147147339
## 54  0.146877601
## 55  0.146491528
## 56  0.145986390
## 57  0.145883481
## 58  0.144433327
## 59  0.143903342
## 60  0.143037104
## 61  0.140536793
## 62  0.140102376
## 63  0.139597326
## 64  0.137949246
## 65  0.137643863
## 66  0.134473454
## 67  0.133731031
## 68  0.133452969
## 69  0.132995864
## 70  0.132861975
## 71  0.132796866
## 72  0.132674376
## 73  0.132054570
## 74  0.131978682
## 75  0.131241089
## 76  0.131028234
## 77  0.130477203
## 78  0.129714428
## 79  0.128517354
## 80  0.128206268
## 81  0.126680044
## 82  0.125418758
## 83  0.124726957
## 84  0.123859032
## 85  0.123252022
## 86  0.122837666
## 87  0.122816295
## 88  0.122614360
## 89  0.121934993
## 90  0.121616442
## 91  0.120482455
## 92  0.120138176
## 93  0.119555463
## 94  0.119342563
## 95  0.119223744
## 96  0.119057676
## 97  0.118243553
## 98  0.117581762
## 99  0.116485319
## 100 0.116193946
## 101 0.115585417
## 102 0.115553342
## 103 0.114139513
## 104 0.113492856
## 105 0.111523593
## 106 0.111212413
## 107 0.111076986
## 108 0.111009664
## 109 0.110831842
## 110 0.110255579
## 111 0.110095940
## 112 0.109904610
## 113 0.109880019
## 114 0.109799187
## 115 0.107617932
## 116 0.105933394
## 117 0.105759592
## 118 0.105604475
## 119 0.105541944
## 120 0.105453651
## 121 0.105287600
## 122 0.104906983
## 123 0.103675234
## 124 0.103293159
## 125 0.102869890
## 126 0.102491094
## 127 0.101668929
## 128 0.101140042
## 129 0.101138421
## 130 0.100940409
## 131 0.100928124
## 132 0.100362245
## 133 0.100183757
## 134 0.099944920
## 135 0.099002933
## 136 0.098693358
## 137 0.098276980
## 138 0.098036350
## 139 0.097213598
## 140 0.096824112
## 141 0.096375276
## 142 0.095347778
## 143 0.094814777
## 144 0.094591517
## 145 0.094560851
## 146 0.094512186
## 147 0.094368076
## 148 0.094327602
## 149 0.094238974
## 150 0.093850553
## 151 0.093748890
## 152 0.092850717
## 153 0.091117643
## 154 0.090759533
## 155 0.090753163
## 156 0.090378205
## 157 0.089873721
## 158 0.089866494
## 159 0.089857334
## 160 0.088917332
## 161 0.088898733
## 162 0.088753036
## 163 0.087969531
## 164 0.087550304
## 165 0.087488526
## 166 0.087459279
## 167 0.086876597
## 168 0.086509802
## 169 0.085656940
## 170 0.085143827
## 171 0.084908219
## 172 0.084796341
## 173 0.084776681
## 174 0.084659210
## 175 0.084558519
## 176 0.084391987
## 177 0.084348885
## 178 0.083852668
## 179 0.083588723
## 180 0.082675202
## 181 0.082625489
## 182 0.082211112
## 183 0.082102179
## 184 0.081440613
## 185 0.081198018
## 186 0.080973583
## 187 0.080510709
## 188 0.080069729
## 189 0.079283678
## 190 0.079023880
## 191 0.076500783
## 192 0.075935196
## 193 0.075397163
## 194 0.075373692
## 195 0.075257988
## 196 0.073844019
## 197 0.072623213
## 198 0.072452074
## 199 0.072202235
## 200 0.071999373
## 201 0.071114367
## 202 0.070807227
## 203 0.070564606
## 204 0.070239267
## 205 0.069898101
## 206 0.069290575
## 207 0.068969853
## 208 0.067996879
## 209 0.067869374
## 210 0.067094531
## 211 0.065971208
## 212 0.065635459
## 213 0.065498357
## 214 0.065082300
## 215 0.064408670
## 216 0.064397271
## 217 0.064100551
## 218 0.063022251
## 219 0.062942152
## 220 0.061197713
## 221 0.060368167
## 222 0.059477877
## 223 0.058918988
## 224 0.058809128
## 225 0.058652826
## 226 0.058320207
## 227 0.057956390
## 228 0.057162313
## 229 0.056902191
## 230 0.056258081
## 231 0.055748782
## 232 0.055679135
## 233 0.055279395
## 234 0.054265196
## 235 0.053776671
## 236 0.052662537
## 237 0.052304355
## 238 0.050741608
## 239 0.050312701
## 240 0.050191206
## 241 0.049356693
## 242 0.049104878
## 243 0.048845848
## 244 0.048342124
## 245 0.046950405
## 246 0.046949845
## 247 0.045455933
## 248 0.044829450
## 249 0.043451762
## 250 0.043078086
## 251 0.040193171
## 252 0.034523097
## 253 0.030400346
## 254 0.026649838
## 255 0.016467622
## 256 0.015542911
## 257 0.010588665
## 258 0.010503902
## 259 0.008251053
## 260 0.007009216
## 261 0.004334459
## 262 0.001040732
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.9908

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_ENM1_AUC<-mean_auc
}
print(FeatEval_Freq_ENM1_AUC)
## Area under the curve: 0.9908

9.3.4. XGBoost

9.3.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 221 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa     
##   0.3  1          0.6               0.50        50      0.7375758  0.28800531
##   0.3  1          0.6               0.50       100      0.7194949  0.25190111
##   0.3  1          0.6               0.50       150      0.7420202  0.32069457
##   0.3  1          0.6               0.75        50      0.6968687  0.17774879
##   0.3  1          0.6               0.75       100      0.7196970  0.26521362
##   0.3  1          0.6               0.75       150      0.7379798  0.30336975
##   0.3  1          0.6               1.00        50      0.6877778  0.13053642
##   0.3  1          0.6               1.00       100      0.7059596  0.17316451
##   0.3  1          0.6               1.00       150      0.7059596  0.19079233
##   0.3  1          0.8               0.50        50      0.6741414  0.11598872
##   0.3  1          0.8               0.50       100      0.7150505  0.24532729
##   0.3  1          0.8               0.50       150      0.7379798  0.30547167
##   0.3  1          0.8               0.75        50      0.6698990  0.12690654
##   0.3  1          0.8               0.75       100      0.7105051  0.20798864
##   0.3  1          0.8               0.75       150      0.7239394  0.23791115
##   0.3  1          0.8               1.00        50      0.6652525  0.04451096
##   0.3  1          0.8               1.00       100      0.7059596  0.17283331
##   0.3  1          0.8               1.00       150      0.7241414  0.23907740
##   0.3  2          0.6               0.50        50      0.7378788  0.25264390
##   0.3  2          0.6               0.50       100      0.7559596  0.30367780
##   0.3  2          0.6               0.50       150      0.7560606  0.31080994
##   0.3  2          0.6               0.75        50      0.7060606  0.20384917
##   0.3  2          0.6               0.75       100      0.6969697  0.15361838
##   0.3  2          0.6               0.75       150      0.6923232  0.14778143
##   0.3  2          0.6               1.00        50      0.7014141  0.14879523
##   0.3  2          0.6               1.00       100      0.7239394  0.22648966
##   0.3  2          0.6               1.00       150      0.7285859  0.24129704
##   0.3  2          0.8               0.50        50      0.7240404  0.18729686
##   0.3  2          0.8               0.50       100      0.7331313  0.23147835
##   0.3  2          0.8               0.50       150      0.7330303  0.23165116
##   0.3  2          0.8               0.75        50      0.7333333  0.25097341
##   0.3  2          0.8               0.75       100      0.7378788  0.27160403
##   0.3  2          0.8               0.75       150      0.7333333  0.26186855
##   0.3  2          0.8               1.00        50      0.7149495  0.17704937
##   0.3  2          0.8               1.00       100      0.7192929  0.19640248
##   0.3  2          0.8               1.00       150      0.7193939  0.20674040
##   0.3  3          0.6               0.50        50      0.7242424  0.23903352
##   0.3  3          0.6               0.50       100      0.7288889  0.28279187
##   0.3  3          0.6               0.50       150      0.7333333  0.29042435
##   0.3  3          0.6               0.75        50      0.7423232  0.25988868
##   0.3  3          0.6               0.75       100      0.7558586  0.28240411
##   0.3  3          0.6               0.75       150      0.7513131  0.27408328
##   0.3  3          0.6               1.00        50      0.7058586  0.16526445
##   0.3  3          0.6               1.00       100      0.6966667  0.16194566
##   0.3  3          0.6               1.00       150      0.6967677  0.15233833
##   0.3  3          0.8               0.50        50      0.6923232  0.14848670
##   0.3  3          0.8               0.50       100      0.7193939  0.22201914
##   0.3  3          0.8               0.50       150      0.7104040  0.20334224
##   0.3  3          0.8               0.75        50      0.7466667  0.27975569
##   0.3  3          0.8               0.75       100      0.7466667  0.27914089
##   0.3  3          0.8               0.75       150      0.7421212  0.26993484
##   0.3  3          0.8               1.00        50      0.6924242  0.14700826
##   0.3  3          0.8               1.00       100      0.7058586  0.17997977
##   0.3  3          0.8               1.00       150      0.7013131  0.17210743
##   0.4  1          0.6               0.50        50      0.6921212  0.20771223
##   0.4  1          0.6               0.50       100      0.7332323  0.29093782
##   0.4  1          0.6               0.50       150      0.7465657  0.34564320
##   0.4  1          0.6               0.75        50      0.7060606  0.24622458
##   0.4  1          0.6               0.75       100      0.7151515  0.25641203
##   0.4  1          0.6               0.75       150      0.7466667  0.34216139
##   0.4  1          0.6               1.00        50      0.7015152  0.16281870
##   0.4  1          0.6               1.00       100      0.7284848  0.27202426
##   0.4  1          0.6               1.00       150      0.7421212  0.29981098
##   0.4  1          0.8               0.50        50      0.6970707  0.21840052
##   0.4  1          0.8               0.50       100      0.7422222  0.31105268
##   0.4  1          0.8               0.50       150      0.7466667  0.34193095
##   0.4  1          0.8               0.75        50      0.6972727  0.18900138
##   0.4  1          0.8               0.75       100      0.7198990  0.26449405
##   0.4  1          0.8               0.75       150      0.7377778  0.29121197
##   0.4  1          0.8               1.00        50      0.7149495  0.19139081
##   0.4  1          0.8               1.00       100      0.7104040  0.19791977
##   0.4  1          0.8               1.00       150      0.7015152  0.17278994
##   0.4  2          0.6               0.50        50      0.7196970  0.25254853
##   0.4  2          0.6               0.50       100      0.7195960  0.26120177
##   0.4  2          0.6               0.50       150      0.7332323  0.30642260
##   0.4  2          0.6               0.75        50      0.7104040  0.20606043
##   0.4  2          0.6               0.75       100      0.7283838  0.24927834
##   0.4  2          0.6               0.75       150      0.7283838  0.26231784
##   0.4  2          0.6               1.00        50      0.6835354  0.15816095
##   0.4  2          0.6               1.00       100      0.7015152  0.18207143
##   0.4  2          0.6               1.00       150      0.7015152  0.18207143
##   0.4  2          0.8               0.50        50      0.7148485  0.17884624
##   0.4  2          0.8               0.50       100      0.7193939  0.20191030
##   0.4  2          0.8               0.50       150      0.7103030  0.18424197
##   0.4  2          0.8               0.75        50      0.6968687  0.14002491
##   0.4  2          0.8               0.75       100      0.7105051  0.19119424
##   0.4  2          0.8               0.75       150      0.6968687  0.16009774
##   0.4  2          0.8               1.00        50      0.7240404  0.20358321
##   0.4  2          0.8               1.00       100      0.7286869  0.22573975
##   0.4  2          0.8               1.00       150      0.7286869  0.22573975
##   0.4  3          0.6               0.50        50      0.7376768  0.29409005
##   0.4  3          0.6               0.50       100      0.7606061  0.33995025
##   0.4  3          0.6               0.50       150      0.7515152  0.33926220
##   0.4  3          0.6               0.75        50      0.7059596  0.20511105
##   0.4  3          0.6               0.75       100      0.7284848  0.25730846
##   0.4  3          0.6               0.75       150      0.7375758  0.27592178
##   0.4  3          0.6               1.00        50      0.7060606  0.14960289
##   0.4  3          0.6               1.00       100      0.7152525  0.15771591
##   0.4  3          0.6               1.00       150      0.7152525  0.15771591
##   0.4  3          0.8               0.50        50      0.7058586  0.21715174
##   0.4  3          0.8               0.50       100      0.7375758  0.28931672
##   0.4  3          0.8               0.50       150      0.7375758  0.30426473
##   0.4  3          0.8               0.75        50      0.7059596  0.18868925
##   0.4  3          0.8               0.75       100      0.7193939  0.20770923
##   0.4  3          0.8               0.75       150      0.7283838  0.23283863
##   0.4  3          0.8               1.00        50      0.6742424  0.11264961
##   0.4  3          0.8               1.00       100      0.6920202  0.15567059
##   0.4  3          0.8               1.00       150      0.6920202  0.15567059
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.4, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.7185905
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.7185905
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       60       13
##   Dementia  6       15
##                                           
##                Accuracy : 0.7979          
##                  95% CI : (0.7025, 0.8737)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.02457         
##                                           
##                   Kappa : 0.4793          
##                                           
##  Mcnemar's Test P-Value : 0.16867         
##                                           
##             Sensitivity : 0.9091          
##             Specificity : 0.5357          
##          Pos Pred Value : 0.8219          
##          Neg Pred Value : 0.7143          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6383          
##    Detection Prevalence : 0.7766          
##       Balanced Accuracy : 0.7224          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]

print(cm_FeatEval_Freq_xgb_Accuracy)
##  Accuracy 
## 0.7978723
print(cm_FeatEval_Freq_xgb_Kappa)
##     Kappa 
## 0.4793003
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 262)
## 
##            Overall
## cg16390578  100.00
## cg25561557   58.69
## cg18339359   55.31
## cg05321907   53.65
## cg19512141   51.46
## cg16268937   50.34
## cg06264882   48.25
## cg04109990   47.21
## cg11358878   38.27
## cg20913114   37.56
## cg19503462   34.69
## cg16715186   33.82
## cg14904299   33.33
## cg13885788   32.62
## cg24422984   32.42
## cg06032337   31.73
## PC1          31.54
## cg17296678   29.90
## cg12466610   29.84
## cg10542624   29.43
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain       Cover   Frequency   Importance
##          <char>        <num>       <num>       <num>        <num>
##   1: cg16390578 0.0583251757 0.027361792 0.025641026 0.0583251757
##   2: cg25561557 0.0342296956 0.026349461 0.010256410 0.0342296956
##   3: cg18339359 0.0322570590 0.020015624 0.010256410 0.0322570590
##   4: cg05321907 0.0312903214 0.017960831 0.010256410 0.0312903214
##   5: cg19512141 0.0300148699 0.026141931 0.010256410 0.0300148699
##  ---                                                             
## 124: cg12240569 0.0004104814 0.001799308 0.005128205 0.0004104814
## 125: cg16361249 0.0003934761 0.001974768 0.005128205 0.0003934761
## 126: cg11716267 0.0003697551 0.001770879 0.005128205 0.0003697551
## 127: cg03057303 0.0002716634 0.002404129 0.005128205 0.0002716634
## 128: cg14465143 0.0001097841 0.002013771 0.005128205 0.0001097841
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.8306

if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_xgb_AUC <- mean_auc
}
print(FeatEval_Freq_xgb_AUC)
## Area under the curve: 0.8306

9.3.5. Random Forest

9.3.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 221 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 177, 176, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.7014141  0.00000000
##   132   0.7105051  0.06261981
##   262   0.7059596  0.04398436
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 132.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.7059596
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.7059596
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       66       25
##   Dementia  0        3
##                                           
##                Accuracy : 0.734           
##                  95% CI : (0.6329, 0.8199)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 0.2901          
##                                           
##                   Kappa : 0.1442          
##                                           
##  Mcnemar's Test P-Value : 1.587e-06       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.1071          
##          Pos Pred Value : 0.7253          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7021          
##          Detection Rate : 0.7021          
##    Detection Prevalence : 0.9681          
##       Balanced Accuracy : 0.5536          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
##  Accuracy 
## 0.7340426
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
##     Kappa 
## 0.1442098
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 262)
## 
##            Importance
## cg06864789     100.00
## cg14252149      74.13
## cg01013522      71.25
## cg12556569      69.16
## cg07584620      67.30
## cg21533482      66.42
## cg24861747      65.83
## cg25561557      65.40
## cg27341708      63.62
## cg05161773      63.07
## cg05096415      62.95
## cg22681945      62.33
## cg23698271      61.93
## cg12080266      61.79
## cg04771146      60.26
## cg18037388      59.31
## cg17329602      58.87
## cg02078724      58.71
## cg12776173      58.55
## cg04124201      58.36
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
##               CN     Dementia
## 1    3.892657193  3.892657193
## 2    2.389650279  2.389650279
## 3    2.222413359  2.222413359
## 4    2.100794283  2.100794283
## 5    1.992528805  1.992528805
## 6    1.941750456  1.941750456
## 7    1.906983119  1.906983119
## 8    1.882163259  1.882163259
## 9    1.778681339  1.778681339
## 10   1.746662138  1.746662138
## 11   1.740078443  1.740078443
## 12   1.703625567  1.703625567
## 13   1.680807609  1.680807609
## 14   1.672681299  1.672681299
## 15   1.583391071  1.583391071
## 16   1.528352838  1.528352838
## 17   1.502578049  1.502578049
## 18   1.493562525  1.493562525
## 19   1.483892446  1.483892446
## 20   1.473095517  1.473095517
## 21   1.469636269  1.469636269
## 22   1.460155972  1.460155972
## 23   1.411810896  1.411810896
## 24   1.411740229  1.411740229
## 25   1.382395574  1.382395574
## 26   1.351608071  1.351608071
## 27   1.336325000  1.336325000
## 28   1.329202272  1.329202272
## 29   1.326027473  1.326027473
## 30   1.321865730  1.321865730
## 31   1.309327544  1.309327544
## 32   1.307969082  1.307969082
## 33   1.293496996  1.293496996
## 34   1.257077965  1.257077965
## 35   1.243386892  1.243386892
## 36   1.204338424  1.204338424
## 37   1.191074010  1.191074010
## 38   1.187886828  1.187886828
## 39   1.184117484  1.184117484
## 40   1.113123323  1.113123323
## 41   1.113063849  1.113063849
## 42   1.088623931  1.088623931
## 43   1.073259155  1.073259155
## 44   1.065718984  1.065718984
## 45   1.060716523  1.060716523
## 46   1.039206371  1.039206371
## 47   1.027657113  1.027657113
## 48   1.023286606  1.023286606
## 49   1.011198650  1.011198650
## 50   1.003588243  1.003588243
## 51   0.996300538  0.996300538
## 52   0.996238763  0.996238763
## 53   0.991425306  0.991425306
## 54   0.951293074  0.951293074
## 55   0.943795989  0.943795989
## 56   0.942560729  0.942560729
## 57   0.937522675  0.937522675
## 58   0.927138700  0.927138700
## 59   0.900799137  0.900799137
## 60   0.886589333  0.886589333
## 61   0.885792001  0.885792001
## 62   0.884036156  0.884036156
## 63   0.859535658  0.859535658
## 64   0.859015130  0.859015130
## 65   0.849438805  0.849438805
## 66   0.819062557  0.819062557
## 67   0.817919511  0.817919511
## 68   0.814920113  0.814920113
## 69   0.812131680  0.812131680
## 70   0.802379972  0.802379972
## 71   0.797237295  0.797237295
## 72   0.794367511  0.794367511
## 73   0.788503012  0.788503012
## 74   0.787984066  0.787984066
## 75   0.745902850  0.745902850
## 76   0.745551124  0.745551124
## 77   0.745032249  0.745032249
## 78   0.739909373  0.739909373
## 79   0.738936880  0.738936880
## 80   0.718617893  0.718617893
## 81   0.701626886  0.701626886
## 82   0.684479048  0.684479048
## 83   0.674834368  0.674834368
## 84   0.660262415  0.660262415
## 85   0.654460337  0.654460337
## 86   0.649600946  0.649600946
## 87   0.641862352  0.641862352
## 88   0.627233461  0.627233461
## 89   0.626362416  0.626362416
## 90   0.610131607  0.610131607
## 91   0.573447348  0.573447348
## 92   0.515846206  0.515846206
## 93   0.513866120  0.513866120
## 94   0.507659598  0.507659598
## 95   0.506042405  0.506042405
## 96   0.501189577  0.501189577
## 97   0.489342847  0.489342847
## 98   0.486798939  0.486798939
## 99   0.482061629  0.482061629
## 100  0.460862997  0.460862997
## 101  0.449841479  0.449841479
## 102  0.443411463  0.443411463
## 103  0.438077839  0.438077839
## 104  0.435984140  0.435984140
## 105  0.432844430  0.432844430
## 106  0.432323483  0.432323483
## 107  0.395332394  0.395332394
## 108  0.371765778  0.371765778
## 109  0.353124499  0.353124499
## 110  0.338073709  0.338073709
## 111  0.325375603  0.325375603
## 112  0.322339620  0.322339620
## 113  0.308821218  0.308821218
## 114  0.297183744  0.297183744
## 115  0.292501283  0.292501283
## 116  0.279816355  0.279816355
## 117  0.278770809  0.278770809
## 118  0.278094441  0.278094441
## 119  0.274240927  0.274240927
## 120  0.268812553  0.268812553
## 121  0.266805103  0.266805103
## 122  0.252649196  0.252649196
## 123  0.240306891  0.240306891
## 124  0.238829075  0.238829075
## 125  0.235380113  0.235380113
## 126  0.234581203  0.234581203
## 127  0.225279253  0.225279253
## 128  0.221100726  0.221100726
## 129  0.220391003  0.220391003
## 130  0.212685597  0.212685597
## 131  0.210925321  0.210925321
## 132  0.209146587  0.209146587
## 133  0.195869589  0.195869589
## 134  0.172278646  0.172278646
## 135  0.170101883  0.170101883
## 136  0.168212145  0.168212145
## 137  0.163753112  0.163753112
## 138  0.142568274  0.142568274
## 139  0.121019015  0.121019015
## 140  0.095278694  0.095278694
## 141  0.081946851  0.081946851
## 142  0.074601452  0.074601452
## 143  0.072550320  0.072550320
## 144  0.064640907  0.064640907
## 145  0.058880375  0.058880375
## 146  0.053215522  0.053215522
## 147  0.049472970  0.049472970
## 148  0.027605027  0.027605027
## 149  0.021640803  0.021640803
## 150  0.020176526  0.020176526
## 151  0.019009874  0.019009874
## 152  0.016718002  0.016718002
## 153 -0.005503155 -0.005503155
## 154 -0.006521378 -0.006521378
## 155 -0.020184578 -0.020184578
## 156 -0.039989559 -0.039989559
## 157 -0.045586626 -0.045586626
## 158 -0.053266372 -0.053266372
## 159 -0.060773795 -0.060773795
## 160 -0.076907241 -0.076907241
## 161 -0.080037221 -0.080037221
## 162 -0.098814626 -0.098814626
## 163 -0.105036537 -0.105036537
## 164 -0.109629212 -0.109629212
## 165 -0.110785922 -0.110785922
## 166 -0.126338873 -0.126338873
## 167 -0.126990721 -0.126990721
## 168 -0.155773815 -0.155773815
## 169 -0.164076072 -0.164076072
## 170 -0.165875456 -0.165875456
## 171 -0.173465097 -0.173465097
## 172 -0.176350143 -0.176350143
## 173 -0.179338426 -0.179338426
## 174 -0.197843254 -0.197843254
## 175 -0.217143493 -0.217143493
## 176 -0.220462356 -0.220462356
## 177 -0.235774451 -0.235774451
## 178 -0.248011904 -0.248011904
## 179 -0.259228890 -0.259228890
## 180 -0.262284613 -0.262284613
## 181 -0.269092459 -0.269092459
## 182 -0.269145351 -0.269145351
## 183 -0.284424516 -0.284424516
## 184 -0.288351788 -0.288351788
## 185 -0.290514273 -0.290514273
## 186 -0.294613546 -0.294613546
## 187 -0.294629123 -0.294629123
## 188 -0.313676792 -0.313676792
## 189 -0.329616284 -0.329616284
## 190 -0.332661340 -0.332661340
## 191 -0.338298447 -0.338298447
## 192 -0.341404219 -0.341404219
## 193 -0.341440571 -0.341440571
## 194 -0.342139538 -0.342139538
## 195 -0.355920360 -0.355920360
## 196 -0.362811019 -0.362811019
## 197 -0.367426498 -0.367426498
## 198 -0.374672433 -0.374672433
## 199 -0.378165623 -0.378165623
## 200 -0.402950910 -0.402950910
## 201 -0.406389336 -0.406389336
## 202 -0.407175251 -0.407175251
## 203 -0.413458024 -0.413458024
## 204 -0.442608050 -0.442608050
## 205 -0.453709213 -0.453709213
## 206 -0.456461085 -0.456461085
## 207 -0.457781945 -0.457781945
## 208 -0.458191481 -0.458191481
## 209 -0.465441987 -0.465441987
## 210 -0.466076416 -0.466076416
## 211 -0.479437838 -0.479437838
## 212 -0.533805285 -0.533805285
## 213 -0.544113621 -0.544113621
## 214 -0.544824736 -0.544824736
## 215 -0.550961565 -0.550961565
## 216 -0.568487820 -0.568487820
## 217 -0.570621074 -0.570621074
## 218 -0.609247628 -0.609247628
## 219 -0.613632648 -0.613632648
## 220 -0.628871896 -0.628871896
## 221 -0.632084894 -0.632084894
## 222 -0.636210210 -0.636210210
## 223 -0.638409936 -0.638409936
## 224 -0.644645287 -0.644645287
## 225 -0.649078042 -0.649078042
## 226 -0.658389907 -0.658389907
## 227 -0.668068135 -0.668068135
## 228 -0.681937478 -0.681937478
## 229 -0.692196497 -0.692196497
## 230 -0.694646396 -0.694646396
## 231 -0.703519799 -0.703519799
## 232 -0.796261786 -0.796261786
## 233 -0.811983881 -0.811983881
## 234 -0.819665808 -0.819665808
## 235 -0.842535626 -0.842535626
## 236 -0.846262676 -0.846262676
## 237 -0.852964775 -0.852964775
## 238 -0.865031749 -0.865031749
## 239 -0.865777567 -0.865777567
## 240 -0.959958415 -0.959958415
## 241 -0.992553359 -0.992553359
## 242 -1.009943550 -1.009943550
## 243 -1.022150288 -1.022150288
## 244 -1.034931818 -1.034931818
## 245 -1.069245740 -1.069245740
## 246 -1.102291805 -1.102291805
## 247 -1.136334545 -1.136334545
## 248 -1.199191249 -1.199191249
## 249 -1.205773720 -1.205773720
## 250 -1.236939156 -1.236939156
## 251 -1.266109924 -1.266109924
## 252 -1.284048788 -1.284048788
## 253 -1.304386812 -1.304386812
## 254 -1.306851460 -1.306851460
## 255 -1.336925182 -1.336925182
## 256 -1.356935317 -1.356935317
## 257 -1.463481887 -1.463481887
## 258 -1.482711428 -1.482711428
## 259 -1.517870609 -1.517870609
## 260 -1.550195681 -1.550195681
## 261 -1.815210850 -1.815210850
## 262 -1.917895888 -1.917895888
if( METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## Area under the curve: 0.7957

if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_rf_AUC<-mean_auc
}
print(FeatEval_Freq_rf_AUC)
## Area under the curve: 0.7957

9.3.6. SVM

9.3.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 221 samples
## 262 predictors
##   2 classes: 'CN', 'Dementia' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 176, 177, 177, 177, 177 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.9050505  0.7808768
##   0.50  0.9095960  0.7902343
##   1.00  0.9141414  0.7995441
## 
## Tuning parameter 'sigma' was held constant at a value of 0.00192655
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.00192655 and C = 1.
print(svm_model$bestTune)
##        sigma C
## 3 0.00192655 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.909596
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.909596
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 1
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia
##   CN       60        1
##   Dementia  6       27
##                                           
##                Accuracy : 0.9255          
##                  95% CI : (0.8526, 0.9695)
##     No Information Rate : 0.7021          
##     P-Value [Acc > NIR] : 1.131e-07       
##                                           
##                   Kappa : 0.8307          
##                                           
##  Mcnemar's Test P-Value : 0.1306          
##                                           
##             Sensitivity : 0.9091          
##             Specificity : 0.9643          
##          Pos Pred Value : 0.9836          
##          Neg Pred Value : 0.8182          
##              Prevalence : 0.7021          
##          Detection Rate : 0.6383          
##    Detection Prevalence : 0.6489          
##       Balanced Accuracy : 0.9367          
##                                           
##        'Positive' Class : CN              
## 
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
##  Accuracy 
## 0.9255319
print(cm_FeatEval_Freq_svm_Kappa)
##     Kappa 
## 0.8306742

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 315 rows and 263 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg27452255     0.8857143   1.142857      1.257143        0.02539683
## 2 cg22901347     1.0000000   1.142857      1.142857        0.02539683
## 3 cg04798314     0.8857143   1.142857      1.142857        0.02539683
## 4 cg03628603     1.0285714   1.142857      1.142857        0.02539683
## 5 cg25758034     1.0000000   1.142857      1.142857        0.02539683
## 6 cg12108278     1.0285714   1.142857      1.257143        0.02539683
plot(importance_SVM)

library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls > cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "Dementia"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "Dementia"] in 28 controls (test_data_SVM1$DX Dementia) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9816
## [1] "The auc vlue is:"
## Area under the curve: 0.9816

if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_svm_AUC<-mean_auc
}
print(FeatEval_Freq_svm_AUC)
## Area under the curve: 0.9816

10. Peromance Metrics

In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")

ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1) 

ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1) 

ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb) 

ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf) 

ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm) 

ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
  classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
  classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
  classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
  classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
  classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
  classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()

library(dplyr)

Metrics_results_df <- data.frame(
  `Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
  `Number_of_Phenotype_Features_Used` = rep(5, 20),
  `Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
  `Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
  `Classification_Type` = rep(classifcationType, 20),
  `Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
  `Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
  `Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
  `Metric` = rep(Feature_and_model_Metrics, 4),
  `Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
  `Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
  `XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
  `Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
  `SVM` = c(ModelTrain_stage_SVM_metrics)
)


print(Metrics_results_df)
##    Number_of_CpG_used Number_of_Phenotype_Features_Used Total_Number_of_features_before_Preprocessing Number_of_features_after_processing Classification_Type
## 1                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 2                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 3                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 4                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 5                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 6                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 7                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 8                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 9                5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 10               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 11               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 12               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 13               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 14               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 15               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 16               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 17               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 18               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 19               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
## 20               5000                                 5                                          5005                                 282 CN vs Dementia (AD)
##    Number_of_Key_features_Selected_.Mean.Median. Number_of_Key_features_remained_based_on_frequency_methods                                       Metrics_Stage
## 1                                            250                                                        262                                   Model Train Stage
## 2                                            250                                                        262                                   Model Train Stage
## 3                                            250                                                        262                                   Model Train Stage
## 4                                            250                                                        262                                   Model Train Stage
## 5                                            250                                                        262                                   Model Train Stage
## 6                                            250                                                        262      Key Feature Evaluation (Select based on Mean) 
## 7                                            250                                                        262      Key Feature Evaluation (Select based on Mean) 
## 8                                            250                                                        262      Key Feature Evaluation (Select based on Mean) 
## 9                                            250                                                        262      Key Feature Evaluation (Select based on Mean) 
## 10                                           250                                                        262      Key Feature Evaluation (Select based on Mean) 
## 11                                           250                                                        262    Key Feature Evaluation (Select based on Median) 
## 12                                           250                                                        262    Key Feature Evaluation (Select based on Median) 
## 13                                           250                                                        262    Key Feature Evaluation (Select based on Median) 
## 14                                           250                                                        262    Key Feature Evaluation (Select based on Median) 
## 15                                           250                                                        262    Key Feature Evaluation (Select based on Median) 
## 16                                           250                                                        262 Key Feature Evaluation (Select based on Frequency) 
## 17                                           250                                                        262 Key Feature Evaluation (Select based on Frequency) 
## 18                                           250                                                        262 Key Feature Evaluation (Select based on Frequency) 
## 19                                           250                                                        262 Key Feature Evaluation (Select based on Frequency) 
## 20                                           250                                                        262 Key Feature Evaluation (Select based on Frequency) 
##                                           Metric Logistic_regression Elastic_Net   XGBoost Random_Forest       SVM
## 1                              Training Accuracy           1.0000000   0.9321267 1.0000000    1.00000000 0.9864253
## 2                                  Test Accuracy           0.9361702   0.8723404 0.7553191    0.72340426 0.8936170
## 3                                     Test Kappa           0.8441989   0.6518519 0.3247970    0.09748892 0.7604485
## 4                                            AUC           0.9816017   0.9945887 0.7586580    0.79274892 0.9821429
## 5  Average Test Accuracy during Cross Validation           0.7397755   0.7454874 0.7143734    0.70289562 0.9093939
## 6                              Training Accuracy           1.0000000   0.9276018 1.0000000    1.00000000 0.9954751
## 7                                  Test Accuracy           0.9148936   0.8617021 0.7765957    0.70212766 0.9148936
## 8                                     Test Kappa           0.7878104   0.6183635 0.3835103    0.00000000 0.8045738
## 9                                            AUC           0.9788961   0.9886364 0.7943723    0.79383117 0.9659091
## 10 Average Test Accuracy during Cross Validation           0.7597531   0.7437854 0.7153068    0.69989899 0.9247138
## 11                             Training Accuracy           1.0000000   0.9321267 1.0000000    1.00000000 0.9954751
## 12                                 Test Accuracy           0.9148936   0.8510638 0.7659574    0.74468085 0.9574468
## 13                                    Test Kappa           0.7878104   0.5840708 0.3763571    0.18965517 0.9003181
## 14                                           AUC           0.9837662   0.9935065 0.8198052    0.77624459 0.9745671
## 15 Average Test Accuracy during Cross Validation           0.7432660   0.7480000 0.7226983    0.70292929 0.9260606
## 16                             Training Accuracy           1.0000000   0.9276018 1.0000000    1.00000000 1.0000000
## 17                                 Test Accuracy           0.9148936   0.8617021 0.7978723    0.73404255 0.9255319
## 18                                    Test Kappa           0.7878104   0.6183635 0.4793003    0.14420976 0.8306742
## 19                                           AUC           0.9783550   0.9908009 0.8306277    0.79572511 0.9816017
## 20 Average Test Accuracy during Cross Validation           0.7432884   0.7502096 0.7185905    0.70595960 0.9095960

Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE

if(FLAG_WRITE_METRICS_DF){
  write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
  print("Metrics Performance output path:")
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method4_CN_vs_AD\\Method4_CN_vs_AD_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

Appendix - Variables

Overview of the Data Frame Variables.

  • Phenotype Part Data frame : “phenoticPart_RAW

  • RAW Merged Data frame : “merged_df_raw

  • Processed Data, i.e data used for model train.

    • name for “processed_data” could be :

      • processed_data_m1”, which uses method one to process the data

      • processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

      • processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

        Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.

      • processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

      • processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

      • processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

    • name for “AfterProcess_FeatureName” could be :

      • AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
      • AfterProcess_FeatureName_m2”, which is column name of principle component method.
      • AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
      • AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
      • AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
      • AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
  • Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles

  • Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered

  • Feature Frequency / Common Data Frame:

    • frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES

    • feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

  • Output data frame with selected features based on mean method: “df_selected_Mean

    , This data frame not have column named “SampleID”.

    • And the Feature names: “selected_impAvg_ordered_NAME
  • Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.

    • And the Feature names: “Selected_median_imp_Name
  • Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.

    • And the Feature names: “df_process_frequency_FeatureName

    • df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”

    • Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency

    • feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

Overview of the Metrics Variables.

  • Number of CpG used: “Number_N_TopNCpGs

  • Phenotype features selected:

    • Multi: “age.now”,“PTGENDER”, “PC1”,“PC2”,“PC3” (Total number: 5)
    • Binary: “age.now”,“PTGENDER”,“PC1”,“PC2”,“PC3” (Total number: 5)
  • Number of features before processing: (#Phenotype features selected) + (#CpGs Used)

  • Number of features after processing (DMP, data cleaning):“Num_feaForProcess

  • Model performance (Variable names)- Model Training Stage:

    • Model Performance
      Initial Model Training Metric Logistic regression Elastic Net XGBoost Random Forest SVM
      Training Accuracy modelTrain_LRM1_trainAccuracy modelTrain_ENM1_trainAccuracy modelTrain_xgb_trainAccuracy modelTrain_rf_trainAccuracy modelTrain_svm_trainAccuracy
      Test Accuracy cm_modelTrain_LRM1_Accuracy cm_modelTrain_ENM1_Accuracy cm_modelTrain_xgb_Accuracy cm_modelTrain_rf_Accuracy cm_modelTrain_svm_Accuracy
      Test Kappa cm_modelTrain_LRM1_Kappa cm_modelTrain_ENM1_Kappa cm_modelTrain_xgb_Kappa cm_modelTrain_rf_Kappa cm_modelTrain_svm_Kappa
      AUC (for multi class, use mean AUC , and use one vs rest method) modelTrain_LRM1_AUC modelTrain_ENM1_AUC modelTrain_xgb_AUC modelTrain_rf_AUC modelTrain_svm_AUC
      Average Test Accuracy during Cross Validation modelTrain_mean_accuracy_cv_LRM1 modelTrain_mean_accuracy_cv_ENM1 modelTrain_mean_accuracy_cv_xgb modelTrain_mean_accuracy_cv_rf modelTrain_mean_accuracy_cv_svm
  • Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES

  • Number of Key features remained based on frequency methods : “Num_KeyFea_Frequency

  • Performance of the set of key features (Selected under 3 methods):

    Based on Mean:

    Based on Mean
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Mean_LRM1_trainAccuracy FeatEval_Mean_ENM1_trainAccuracy FeatEval_Mean_xgb_trainAccuracy FeatEval_Mean_rf_trainAccuracy FeatEval_Mean_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Mean_LRM1_Accuracy cm_FeatEval_Mean_ENM1_Accuracy cm_FeatEval_Mean_xgb_Accuracy cm_FeatEval_Mean_rf_Accuracy cm_FeatEval_Mean_svm_Accuracy
    Test Kappa cm_FeatEval_Mean_LRM1_Kappa cm_FeatEval_Mean_ENM1_Kappa cm_FeatEval_Mean_xgb_Kappa cm_FeatEval_Mean_rf_Kappa cm_FeatEval_Mean_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Mean_LRM1_AUC FeatEval_Mean_ENM1_AUC FeatEval_Mean_xgb_AUC FeatEval_Mean_rf_AUC FeatEval_Mean_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Mean_mean_accuracy_cv_LRM1 FeatEval_Mean_mean_accuracy_cv_ENM1 FeatEval_Mean_mean_accuracy_cv_xgb FeatEval_Mean_mean_accuracy_cv_rf FeatEval_Mean_mean_accuracy_cv_svm

    Based on Median:

    Based on Median
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Median_LRM1_trainAccuracy FeatEval_Median_ENM1_trainAccuracy FeatEval_Median_xgb_trainAccuracy FeatEval_Median_rf_trainAccuracy FeatEval_Median_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Median_LRM1_Accuracy cm_FeatEval_Median_ENM1_Accuracy cm_FeatEval_Median_xgb_Accuracy cm_FeatEval_Median_rf_Accuracy cm_FeatEval_Median_svm_Accuracy
    Test Kappa cm_FeatEval_Median_LRM1_Kappa cm_FeatEval_Median_ENM1_Kappa cm_FeatEval_Median_xgb_Kappa cm_FeatEval_Median_rf_Kappa cm_FeatEval_Median_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Median_LRM1_AUC FeatEval_Median_ENM1_AUC FeatEval_Median_xgb_AUC FeatEval_Median_rf_AUC FeatEval_Median_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Median_mean_accuracy_cv_LRM1 FeatEval_Median_mean_accuracy_cv_ENM1 FeatEval_Median_mean_accuracy_cv_xgb FeatEval_Median_mean_accuracy_cv_rf FeatEval_Median_mean_accuracy_cv_svm

    Based on Frequency:

    Based on Frequency
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Freq_LRM1_trainAccuracy FeatEval_Freq_ENM1_trainAccuracy FeatEval_Freq_xgb_trainAccuracy FeatEval_Freq_rf_trainAccuracy FeatEval_Freq_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Freq_LRM1_Accuracy cm_FeatEval_Freq_ENM1_Accuracy cm_FeatEval_Freq_xgb_Accuracy cm_FeatEval_Freq_rf_Accuracy cm_FeatEval_Freq_svm_Accuracy
    Test Kappa cm_FeatEval_Freq_LRM1_Kappa cm_FeatEval_Freq_ENM1_Kappa cm_FeatEval_Freq_xgb_Kappa cm_FeatEval_Freq_rf_Kappa cm_FeatEval_Freq_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Freq_LRM1_AUC FeatEval_Freq_ENM1_AUC FeatEval_Freq_xgb_AUC FeatEval_Freq_rf_AUC FeatEval_Freq_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Freq_mean_accuracy_cv_LRM1 FeatEval_Freq_mean_accuracy_cv_ENM1 FeatEval_Freq_mean_accuracy_cv_xgb FeatEval_Freq_mean_accuracy_cv_rf FeatEval_Freq_mean_accuracy_cv_svm